ONSITE uses an innovative approach to natural language processing (NLP) in enabling state of the art natural language processing rates in support of military tactical operations. The initiative also investigated ontologies for higher processing throughput and the improved semantic resolution of extracted information.
KBSI has been awarded funding from the Defense Advanced Research Projects Agency (DARPA) to research, design, and demonstrate enabling technology for Open-Source Information Tactical Exploitation (ONSITE). The ONSITE initiative applies an innovative approach to natural language processing (NLP) that aims at achieving state of the art processing rates for the understanding of natural language in support of military tactical operations. DARPA’s goal is to improve natural language processing speed and efficiency despite constrained computational resources, accelerated operational timelines, and specific intelligence objectives. Improving the speed and efficiency of NLP allows war fighters to more quickly process data in the bid for tactical advantage.
The ONSITE technology addresses the shortcomings of current methods for processing open-source, natural-language information. While the automated support of intelligence analysis has matured with the emergence of large-scale storage and data-processing capabilities, and sophisticated natural-language processing algorithms, tactical applications making use of NLP are hampered by the processing speed of current algorithms and the restrictions of tactical computing platforms. KBSI’s approach deviated from traditional statistical and grammar rule-based approaches to shallow NLP (e.g., from part-of-speech and phrase chunking) by instead using a specially constructed ontology that led to higher processing throughput and increased semantic resolution of extracted information.
The initial phase of the ONSITE initiative investigated and developed the following:
- Optimized heuristics that limit the amount of math-related processing (especially floating-point operations), leading to higher CPU utilization;
- Heuristics that support the structuring of calculations to allow for better use of higher-speed CPU cache memory; and
- Heuristics that improve so-called “shallow” natural language processing by requiring less system computation and increasing robustness despite noisy data.
In the initial phase, the semantically rich extractions that were generated focused on supporting influence operations that targeted enemy networks, both physical (existing in the real world) and virtual (existing online).
Phase II Development
The ONSITE project seeks to provide high-speed, ontology-based searching of unstructured and un-indexed natural language text. In the ONSITE technology, searching is guided by a set of user defined queries in which the definition and representation of the queries use concepts and lexemes from a large-scale, common sense ontology and a comprehensive lexicon.
The ONSITE system configuration combines named entity recognition and social network extraction. In phase II of the initiative, KBSI configured a semantic pattern library (SPL) from the ontological semantic resource (OSR) and designed a method for mapping user requests into the SPL. KBSI also designed an enhanced method for mapping semantic patterns (in the SPL) to syntactic patterns and methods for the extraction of syntactic patterns. This work resulted in the development of a toolkit for semantic information processing known as the OSR Studio™ toolkit.
ONSITE applications allow users to rapidly process unstructured and un-indexed natural language text, enabling the rapid identification and extraction of information capable of supporting a wide range of tactical applications in the field. In general, ONSITE provides ontology-focused entity and relationship identification and extraction. Based on how users configure the ONSITE system, it can be used to support many different types of applications, including military intelligence and computer forensics.
ONSITE technology components are now being evaluated for use by multiple organizations including the National Air and Space Intelligence Center (NASIC), the Joint Information Operations Warfare Center (JIOWC), and the Joint Warfare Analysis Center (JWAC).