RWE

Background and requirement

Human entered data by nature will have some errors. This is particularly true when humans are entering large amounts of data under time pressure. This was true for an Army dataset comprising of orders for replacement parts for rotary wing aircraft.

TP Group were tasked with providing assurance that this data was correctly labelled and implementing corrections where needed. This dataset consisted of hundreds of rows with text and categorical columns all needing to be taken into account in determining whether the row had been correctly categorised.

Approach

  • Manually labelled a sub-section of the data, which was then using for training, validation, and testing.
  • Under sampling was used to balance out the different classes in the data.
  • Used TF-IDF (Term Frequency - Inverse Document Frequency) on the text columns to provide features to train the model on.
  • A TensorFlow neural network classifier model was used.
  • Prediction confidence was used to label the data to an acceptable standard of accuracy.

Outcome

The final model could label data with 83% accuracy. Used this way the model provided an extra check for the human user, helping them label more confidently in borderline cases. By making use of the model confidence in each prediction by only taking predictions over a confidence threshold, we could achieve 99% accuracy. This saved the hundreds of hours it would have taken a human to manually label those rows.

Interested in discussing a project?

Contact us to arrange a call