AI in Factoring: A Reality Check on Large Language Models
Written by: Emilia Apostolova PhD, CTO, Peruse
In just a few years, ChatGPT and similar models (and there are many competitors), overhauled everyone's expectations as to what can be achieved with AI.
Historically, the field of AI, dating back to the 1950s, has gone through multiple periods of excitement, followed by disappointment. Previously, however, the excitement was mostly confined to the nerdy, and has never approached the current hysteria. Even when IBM Watson won the game of Jeopardy in 2011, which was followed by a massive commercialization campaign (does anyone remember the 2016 Bob Dylan IBM Watson commercial?), the general public was mostly uninterested in AI. Subsequent commercializing attempts disappointed and AI was mostly shrugged off with "Oh well, that didn't work".
So, what is different now? Is ChatGPT such a massive breakthrough? Like most scientific progress, breakthroughs come in the form of tiny steps that accumulate, and, when the stars align, lead to impressive results. Fast and affordable computing, large digital datasets, advances in deep learning, all led to the building of larger and larger language models. These large language models (LLMs) can "learn" a big chunk of the Internet, which in turn contains most of human knowledge.
But then again, what exactly does “learning” mean? If you feed a machine learning model hundreds of gigabytes of text, including encyclopedias and scientific publications, and just teach it to guess the next word, given some preceding words, surely enough, the model will learn and memorize A LOT about the human produced text. The question is, if this LLM knowledge is similar to human knowledge and if it can substitute human expertise? While this is largely a philosophical question, let's see what this means in practical terms.
Indeed, ChatGPT-like models can pass the medical licensing exam and the bar exam, generate computer code, and write haiku poems, among other things. This, however, is because they were trained on many text examples doing precisely that. When it comes to industry-specific tasks, which a minimally trained human can typically perform with ease, things start to look less impressive.
In factoring, as in most industry specific tasks, factoring-related texts are not flooding the internet and are not accessible to ChatGPT and friends. A person in factoring does not blog about load bundles they reviewed, or narrate the thought process they used to deduce that a load is missing paperwork, or write a Wikipedia article about recent trends in carrier frauds. As a result, utilizing ChatGPT-like models for business tasks reviewing paperwork is far from trivial.
In particular, when using ChatGPT-like models in factoring, one will be faced with the following issues: high complexity, low accuracy, significant cost, possible scalability issues, privacy and risk concerns. Let's discuss each of these in turn.
High Complexity
For the purposes of ChatGPT, a factoring decision needs to be broken into a number of smaller, solvable steps.
Let's focus on load purchasing in factoring. A factoring operations team receives loads (typically a rate confirmation and a proof of delivery), and makes a decision if a load should be funded or not, checking for missing paperwork, fraud, double brokering, etc. Achieving this task with current AI systems is surprisingly difficult. The task needs to be broken into a series of simple questions: Did I receive all the paperwork? Do I have the rate confirmation and proof of delivery, are they legible, are pages missing?; Does this rate confirmation look like fraud in terms of dates or rates?; Does the proof of delivery match the rate confirmation?; Was the load delivered, is there a receiver signature?; Are there any damages, shortages, or overages?; etc. Each of these questions is a ChatGPT engineering effort by itself, which adds up in terms of maintenance cost, engineering cost, and API usage costs.
Low Accuracy
Each of the above small questions can be tackled with state of the art AI. However, we are far from a general intelligence AI system that can be prompted to answer all these questions with close to human performance. Let's focus on one of these questions, as an example. What are the broker names, carrier names and load numbers on the rate confirmation? This task in AI and Natural Language Processing is known as Named Entity Recognition (NER). While ChatGPT's out-of-the-box results are impressive, research shows that NER performance drops by 30 to 40 absolute percentage points, compared to specialized Machine Learning models trained on the task [1]. Currently, ChatGPT-like models perform substantially below human-level performance on NER tasks.
Significant Cost
ChatGPT is currently shockingly affordable. However, since we have to solve multiple sub-problems, the number of requests, token inputs, and token outputs adds up. We estimate that a load verification request can cost between $0.15 to $0.60 on average, assuming you took the time to fine-tune various ChatGPT models for the load verification sub-problems. This is an on-going cost, in addition to the engineering development and maintenance costs, which can be significant. There is also the issue of uncertainty around pricing and availability, as it is a known fact that ChatGPT loses money on each request [2].
Possible Scalability Issues
Larger factors (processing more than 1,000 loads per day) will likely run into ChatGPT rate-limiting issues. Depending on the specific implementation and subscription, running a load through ChatGPT can be limited to several loads per minute, as there are both tokens-per-minute and requests-per-minute limitations.
Privacy and Risk Concerns
There are available options to protect your confidential data while using ChatGPT-API or ChatGPT Enterprise. However, there are some additional gotchas in the user agreement your attorney might object to [3]. Piling intellectual property lawsuits regarding ChatGPT’s usage of training data without permission, as well as forthcoming and yet uncertain AI legislation, pose additional risk factors that need to be carefully evaluated by businesses.
In summary, ChatGPT and friends can be a valuable tool for a number of AI-assisted tasks. In our opinion, however, the factoring industry should exercise caution and adjust its expectations as to what is feasible when using a general-purpose AI tool, like ChatGPT.
References:
1. https://arxiv.org/abs/2305.14450
2. https://www.washingtonpost.com/technology/2023/06/05/chatgpt-hidden-cost-gpu-compute/
3. https://simpliant.eu/insights/GDPR-requirements-when-using-chatgpt-api