The Military Health Data Mining Algorithms Library (M-HDML) applies data mining technologies and techniques to the storage and retrieval of patient data, helping doctors in the DoD’s vast medical health system (MHS) to more accurately diagnose their patients.
In our increasingly data-centric world, data mining technologies are being enlisted for a wide variety of uses: from retail sales, to video gaming, to, most recently, combating terrorism. The staggering amount of data has improved the stock of intelligent data mining systems and knowledge discovery techniques that help users extract meaningful information from enormous data sets. In the industrial arena, more and more organizations are investing in data mining techniques (software and hardware) as a means for gaining profitable business insights from their huge central transactional databases. The Gartner group estimates that the use of data mining applications will increase from less than 5% currently to 80% over the next decade.*
KBSI’s Military Health Data Mining Algorithms Library (M-HDML) initiative, an SBIR award from the Department of Defense (DoD), applied data mining technologies and techniques in a relatively new arena: medical diagnostics. The initiative was part of the DoD’s Knowledge Management movement aimed at improving the reliability and utility of the DoD’s knowledge assets.
Phase II Development
Like other industries, health care both collects and utilizes diverse types of data–not only clinical data crucial to diagnosing and monitoring the health of patients, but also data necessary to the administration of hospitals, medical resources, and health care in general.
KBSI’s M-HDML initiative took a unique approach in developing data mining and analysis techniques, templates, and software tools for the DoD’s vast medical health system (MHS). The goal was to both improve individual patient care and, by translating clinical data into a standardized form that lends itself to data mining, aid the DoD in enhancing medical readiness–an important component of military readiness.
The primary challenge that the KBSI team addressed centered on questions of data taxonomy with regard to the data mining algorithms. Dr. Satheesh Ramachandran, who headed the initiative for KBSI, explains: “Different data mining algorithms support the discovery of different taxonomical knowledge types. These types, consequently, require different processes to determine their relevance and application. Simply put, the type of knowledge that is being sought dictates the choice of the specific data-mining tasks and the algorithms that support those tasks.”
The different types of clinical data that M-HDML works with include questionnaires, diagnostic reports, pathology reports, lab-tests, evaluations and progress reports, medical studies and experiments, etc. These different types of clinical data are collected at various locations, over various, often protracted, periods of time. More significantly, however, the data can exist in various systems and formats ranging from rudimentary paper forms to electronic files. The 1996 Clinger-Cohen Act mandated improvements to the efficiency of the DoD medical health system, and a subsequent review of the system noted the need for a thorough business process reengineering (BPR) of the functions, processes, and information systems of the MHS infrastructure. Key tasks performed by the information management directorate of the MHS include the establishment of flexible, open-systems-compliant configurations for clinical data with the goal of making accurate medical information available wherever and whenever needed. Redesigning the information systems for more pliable data hosting is an important first step. The next logical step is the introduction of knowledge discovery tools and techniques to explore these integrated systems and deliver the useful knowledge they contain.
In keeping with this paradigm, the M-HDML initiative first focused on developing a generic representation scheme that will encapsulate the various types of clinical data. As might be expected, existing clinical data and databases are not based upon well-thought out representations for signs, symptoms, treatments, tests, etc. and little work had been done in terms of developing a general language or structure for representing clinical data that is conducive to quick analysis. The choice of a particular data mining algorithm depends largely on the nature of the data (how data concerning, for example, a condition’s severity or frequency is encoded) and the type of paradigm in which the algorithm must function. M-HDML’s standardized representation framework allows multiple data analysis paradigms to work in tandem, exchanging information across paradigms and allowing them to complement each other. For M-HDML, KBSI based these representations on the notion of ontologies: structured languages for representing knowledge.
A related M-HDML task was to develop a catalog of the vast array of data mining algorithms that could be used and map these algorithms, according to their success, to the knowledge requirements of the military health system. These algorithm maps serve as templates for common knowledge discovery tasks. And the templates, by codifying common techniques, also lend themselves to user adaptation or customization for related data inquiries.
The algorithm maps gave the M-HDML team a clear understanding of the nature of these algorithms and a better sense of the shape of the data representation structure. KBSI created similar libraries of data mining algorithms through earlier funded initiatives from the DoD and NASA and also created and validated such a library for a host of applications in DoD logistics. The M-HDML library of algorithms and established framework allowed the M-HDML team to begin detailing typical data-mining development processes: rules for using domain knowledge structures.
The final step and end goal of M-HDML was to utilize these rules in developing an overarching framework for the integrated use of data mining algorithms and strategies–the Military Health Data Mining Library–and encapsulate the framework in a supporting software environment. This software tool-kit is Web-enabled, allowing users to collaborate, regardless of their geographic location, and exchange data, templates, analyses, and, most importantly, knowledge.
The M-HDML initiative provides the DoD with much needed knowledge that is useful from a purely medical standpoint. However, the M-HDML concept also offers a novel approach to large-scale data mining and knowledge discovery that can have an even wider application. The cataloging of algorithms and templates–the creation of a data mining library–provides a sophisticated framework for other knowledge discovery projects that involve large, disparate stores of data and large, diverse sets of users. KBSI’s novel approach not only benefits the DoD in their Knowledge Management push, but can also help the growing number of commercial businesses that are expanding the possibilities and potential of data mining.
* Brachman, R. J., Anand, T. 1996. “The Process of Knowledge Discovery in Databases.” In Advances in Knowledge Discovery and Data Mining, pp. 37- 57, AAAA/MIT.