Validation results
As mentioned in the previous section, errors are propagated throughout the pipeline whenever there are two or more machine learning models running in sequence. For this solution, the sentiment of the sentence is the most important factor in measuring the firm’s stock risk level. The speech-to-text model, although essential to the pipeline, serves as the preprocessing unit before the sentiments can be predicted. What really matters is the difference in sentiment between the ground truth sentences and the predicted sentences. This serves as a proxy for the word error rate (WER). The speech-to-text accuracy is important, but the WER is not directly used in the final pipeline metric.
PIPELINE_SENTIMENT_METRIC = MEAN(DIFF(GT_sentiment, ASR_sentiment))
These sentiment metrics can be calculated for the F1 Score, Recall, and Precision of each sentence. The results can be then aggregated and displayed within a confusion matrix, along with the confidence intervals for each metric.
The benefit of using transfer learning is an increase in model performance for a fraction of data requirements, training time, and cost. The fine-tuned models should also be compared to their baseline versions to ensure the transfer learning enhances the performance instead of impairing it. In other words, the fine-tuned model should perform better on the support center data than the pretrained model.
Pipeline assessment
Test case | Details |
---|---|
Test number |
Pipeline sentiment metric |
Test prerequisites |
Fine-tuned models for speech-to-text and sentiment analysis models |
Expected outcome |
The sentiment metric of the fine-tuned model performs better than the original pretrained model. |
Pipeline sentiment metric
-
Calculate the sentiment metric for the baseline model.
-
Calculate the sentiment metric for the fine-tuned model.
-
Calculate the difference between those metrics.
-
Average the differences across all sentences.