A client dealing with a conversational agent, struggled to improve performance across all the question domains, and cannot deploy the system. But how do they identify where they can deploy and what they need to prioritise?
Benchmarking - which question classes are most unreliable and susceptible to changes in speech?
Mitigation - how do we mitigate underperforming classes?
Deployment - can we deploy the model to a production environment?
Benchmarking - Advai's library of NLP stress tests and dictionaries, can identify common failures for NLP systems.
Optimisation - Common dictionaries and spelling mistakes can be identified for incorporation into the training pipeline and used to increase overall model performance.
Deployment - Underperforming classes can be linked to manual interventions and model retraining to ensure the system can be partially deployed to increase the base level of automation.
By understanding the strengths and weaknesses of the the language being processed the NLP model can be made more robust to changes in terminology.
The data scientist is able to proactively identify issues before deployment. This allows an increased velocity in the improvement of model performance, and a more robust model
Because the customer knows which parts of their AI models perform well or poorly, they are able to automate the triage process effectively and release value from the project in production .