Project Blotter

Background and requirement 

Intelligence Analysts search through an abundance of disparate unstructured reports to identify and extract relevant entities, (people, organisations, equipment, etc.), from which valuable relationships are discovered. The information is collated into spreadsheets and/or written notes. This data is, with varying degrees of success, mapped into appropriate downstream systems.

Defence Intelligence wished to adopt current and emerging Natural Language Processing (NLP) technology to support knowledge extraction using automated processes and thereby improve analyst efficiency.

To solve the above problem, TP Group were tasked by DI-RIC with generating a labelled Defence Intelligence dataset based on a set of documents from a MoD training exercise. These documents were labelled in accordance with the MoD’s information exchange standards.


  • Identify a commercial software package called UBIAI to enable documents to be labelled. The tool was designed to allow manual labelling of the provided dataset, is browser based and was designed to allow collaboration and engender accurate and consistent labelling.
  • Develop a labelling schema to make sure all relevant entities are identified and labelled correctly and consistency across the ~200 documents that were labelled.
  • Label the documents within UBIAI with the relevant entities as identified in the schema.
  • Developed a series of programmatic checks and processes to identify inconstant labelling and correct to ensure an accurate dataset.


The data provided to DI-RIC allowed them to, for the first time, apply advanced NLP techniques to a defence specific dataset. This will give them valuable techniques to analyse the complex unstructured text data quicker and automatically. When compared to the previously released defence specific data sets the complexity of the schema and therefore the granularity of analysis possible was increased threefold.

Interested in discussing a project?

Contact us to arrange a call