LARGE LANGUAGE MODELS FOR ENTERPRISE WORKFLOW AUTOMATION IN FINANCIAL OPERATIONS
Keywords:
Intent classification, Financial workflow automation, Customer-query routing, TF-IDF, Logistic regression, Banking77, Reproducible baselines, Lightweight NLPAbstract
Intent classification is a key component of customer-query routing in financial workflow automation. While large language models (LLMs) have attracted substantial interest, their cost, latency, and on-premises deployment constraints motivate efficient routing systems that can operate reliably on standard CPU-based infrastructure. This study examines whether lightweight CPU-based classifiers can serve as a cost-effective routing layer in LLM-era enterprise automation systems. We present a reproducible benchmark on the public Banking77 dataset (13,083 queries, 77 fine-grained intents) comparing five CPU-only pipelines built from term-frequency–inverse-document-frequency (TF-IDF) features and linear classifiers. Our proposed pipeline, LR-Fusion, combines word- and character-level TF-IDF representations within a single logistic regression classifier and attains 0.9123 accuracy and 0.9124 macro-F1 (95% CI [0.9017, 0.9219]) with a median single-query latency of 7.0 ms on a single CPU core. Paired-bootstrap tests indicate that the improvement of LR-Fusion over each baseline is consistent across resamples at the 0.05 significance level. Error analysis identifies semantically overlapping intent pairs (e.g., verify_my_identity vs. why_verify_identity) as the primary residual failure mode, suggesting a potential role for selective LLM-based reranking in future hybrid systems. The implementation package is provided to support reproducibility of the reported experiments.References
[1] Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems (NeurIPS), 2020, 33: 1877-1901.
[2] Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS), 2020, 33: 9459-9474.
[3] Wu S, Irsoy O, Lu S, et al. BloombergGPT: A large language model for finance. arXiv preprint, 2023. DOI: 10.48550/arXiv.2303.17564.
[4] Casanueva I, Temčinas T, Gerz D, et al. Efficient intent detection with dual sentence encoders. Proc. 2nd Workshop on Natural Language Processing for Conversational AI (ACL), 2020: 38-45.
[5] Larson S, Mahendran A, Peper JJ, et al. An evaluation dataset for intent classification and out-of-scope prediction. Proc. EMNLP-IJCNLP, Hong Kong, China, 2019: 1311-1316.
[6] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 2017, 30: 5998-6008.
[7] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. NAACL-HLT, Minneapolis, MN, USA, 2019: 4171-4186.
[8] Araci D. FinBERT: Financial sentiment analysis with pre-trained language models. arXiv preprint, 2019. DOI: 10.48550/arXiv.1908.10063.
[9] Yang Y, Uy M C S, Huang A. FinBERT: A pretrained language model for financial communications. arXiv preprint, 220. DOI: 10.48550/arXiv.2006.08097.
[10] Joachims T. Text categorization with support vector machines: Learning with many relevant features. Proc. 10th European Conference on Machine Learning (ECML), LNCS, Springer, 1998, 1398: 137-142.
[11] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 2011, 12(85): 2825-2830.