The Minneapolis firm Colectica recently completed a project resulting from their third SBIR grant through the National Institute of Health (NIH). Focused on corralling the endless supply of unstructured data through Open Standards like those standards developed by the Data Documentation Initiative (DDI), they have found their niche with research organizations responsible for securely analyzing data within the social, behavioral, and economic sciences.
Colectica was spun out of Algenta Technologies, a software company that delivered a variety of services that included DNS Hosting, custom development projects, and a series of contracts with the University of Minnesota Computer Science Department. The experience and capital allowed the founders the ability to pursue their interests in data integrity.
Co-founder Dan Smith describes their platform as “enabling organizations to document the complete life cycle of data. This includes recording why and how the data were created, by whom, for what purposes, where it can be accessed, its representation, and what each piece of data is comparable to. This fine grained description of data, the concepts captured, and the data processes are all recorded using Open Standards from ISO/IEC and research consortiums”.
The data life cycle starts with a study concept, which is tranformed through data collection and other standardized processes, before it is analyzed. This complete data workflow, which is underlying the Colectica platform, serves as a foundation for data analysis. The data that moves through this life cycle is stored in a repository, which is essentially a version control system for metadata and relationships.
This process is completely storage-agnostic, which means that it does not matter if the data is stored in a relational database or in a big data infrastructure like Hadoop for it to be described. Querying for metadata from the repository is also straightforward because it will reflect the client’s infrastructure decisions. Most implementations of Colectica occur within the clients own network because of organizational policies and the sensitivity concerns around their data. However, Colectica does provide a hosted option for added flexibility.
Their platform has provided rigor and commonality for research groups within the National Statistical Offices (NSO). These entities have a need to document and automate many of their data processes, and need to do so using open and standardized models.
In addition to agencies within the NSO, individual researchers who practice survey methodologies are also interested in structuring their data for analysis. These researchers, who are actively creating questions and measuring opinions, behaviors, and experiences of the public, can intelligently document and compare their research to others with this platform.
As more industries turn to Open Standards for structuring the deluge of data, Colectica hopes to play an active role in describing and linking data through identification and versioning techniques. Market research firms, which have traditionally conducted one-off projects like opinion polls, generally did not concentrate on documentation and standards. However, this is evolving as clients become more sophisticated with data. They have started to request complete and portable metadata documentation of their data and the methods used to collect it. Colectica is hoping that this will represent an interesting opportunity for them in the near future.
Colectica is looking to secure phase II funding of an SBIR grant through the NIH this spring. As more organizations turn to Open Standards to make sense of huge amounts of data for analysis, Colectica is well positioned to continue building out their solution through a secure and structured platform.