Copyright © 2024
Adaptive ML, Inc.
All rights reserved
Adaptive ML, Inc.
All rights reserved
New Startup with $20 Million in Funding Aims to Help Companies Tailor LLMs for Business
The company is working on technology that makes it easier for businesses to train large language models (LLMs) that are tailored to their specific needs.
The seed investment is being led by Index Ventures with participation from ICONIQ Capital, Motier Ventures, Databricks Ventures, HuggingFund by Factorial, and some individual angel investors. The company’s valuation was not disclosed, although tech publication The Information previously reported that the funding round valued the startup at $100 million.
Adaptive is working on a way to improve on a process that is known as reinforcement learning from human feedback, or RLHF. This process has been a key to taking LLMs, which are initially trained from a huge amount of text to predict the next word in a sentence, and making them more useful as the engines that power chatbots, such as OpenAI’s ChatGPT.
RLHF involves gathering feedback from human evaluators on the quality of an LLM’s responses. The LLM is then further trained to provide answers that are more like the ones that the evaluators rate highly. But RLHF has typically involved hiring contractors to evaluate a model, often using a simple thumbs or thumbs down to grade its answers. This method is expensive—the cost of data annotation contracts make up a good portion of the training costs of LLM-based chatbots, for example— and the quality of the feedback is sometimes too crude to produce good results for many business use cases of LLMs.
“It is hard to get the model to do what you want,” Julien Launay, Adaptive’s cofounder and CEO, said.
Adaptive wants to allow LLMs to learn on a regular and on-going basis from how a company’s own employees or customers actually interact with the software. The next actions and responses that a user makes in response to the LLM’s output is a much richer training signal in many cases than a thumbs or thumbs down given by a paid evaluator.
The platform will also help companies run a process called reinforcement learning from AI feedback (RLAIF), where a separate AI model critiques the responses of the AI model that is being trained. This can lower the cost of training and result in a better range of training data than using human evaluators.