Large Language Models (LLMs) unlock a wide spectrum of potential use cases to deliver business value, from analyzing the sentiment of text data stored in a SQL warehouse to deploying real-time chat bots that answer nuanced questions about your products. However, democratizing access to powerful SaaS and open source LLMs for these applications comes with various security, cost, and data-related challenges. For example, consider the specific challenge of effectively managing SaaS LLM API tokens throughout an enterprise:
- Security issues with teams pasting API tokens as plain text in communications
- Cost issues with shared keys leading to application crashes and peaks in costs from rate limit abuse
- Governance issues with every team managing their own API tokens with no guardrails.
These challenges inhibit organizations from scaling access to LLM providers (such as OpenAI, Anthropic, and open source models) for innovation. Furthermore, to quickly solve business problems using the latest models, data analysts and data scientists need access to cutting-edge LLMs with a standard interface.
Today, we are thrilled to announce the preview of the AI Gateway component in MLflow 2.5. The MLflow AI Gateway is a highly scalable, enterprise-grade API gateway that enables organizations to manage their LLMs and make them available for experimentation and production use cases. Features include centralized management of LLM credentials and deployments, standardized interfaces for common tasks such as chat and completions, and integrations with multiple SaaS and open source LLMs. With the AI Gateway:
- Organizations can secure their LLMs from development through production
- Data analysts can safely query LLMs with cost management guardrails
- Data scientists can seamlessly experiment with a variety of cutting-edge LLMs to build high-quality applications
- ML Engineers can reuse LLMs across multiple deployments
Read on to learn more about how to use the AI Gateway.
Secure access to your LLMs with AI Gateway Routes
Ensuring each use case and application has access to the models it requires is critical, but it’s also important to systematically govern and limit this access to control costs and prevent security breaches. Rather than having each team in your organization manage their own SaaS LLM credentials, the AI Gateway enables centralized access to LLM technologies with guardrails. This means an organization can manage a “development” and a “production” key for each SaaS LLM across the organization and configure user and service rate limits.
The AI Gateway provides this centralized access through Routes. A Route represents an LLM from a particular vendor (e.g., OpenAI, Anthropic, or Hugging Face) and defines its associated credentials and configurations. Organizations can simply create Routes for each of their use cases and delegate access to consumers, such as data analysts, data scientists, and production applications, as needed. Consumers can query these routes behind a standard interface, but they do not have direct access to the credentials or configurations, thus guarding against credential leaks and unauthorized use.
The following code snippet demonstrates how easy it is to create and query an AI Gateway Route using the MLflow Python client:
from mlflow.gateway import set_gateway_uri, create_route, query
set_gateway_uri("databricks")
# Create a Route for completions with OpenAI GPT-4
create_route(
name="gpt-4-completions",
route_type="llm/v1/completions",
data={
"name": "gpt-4",
"provider": "openai",
"openai_config": {
"openai_api_key": $OPENAI_API_KEY
}
}
)
# Query the Route with a prompt
gpt4_response = query(
route="gpt-4-completions",
data={"prompt": "What is MLflow?"}
)
assert gpt4_response == {
"candidates": [
{
"text": "MLflow is an open-source platform for end-to-end ML...",
"metadata": {"finish_reason": "stop"}
}
],
"metadata": {
"input_tokens": 13,
"output_tokens": 7,
"total_tokens": 20,
"model": "command",
"route_type": "llm/v1/completions"
}
}
The AI Gateway also supports open source models deployed to Databricks Model Serving, enabling you to reuse an LLM across multiple applications. The following code snippet creates and queries an AI Gateway Route for text completions using a Databricks Model Serving endpoint with the open source MPT-7B-Chat model:
create_route(
name="oss-mpt-7b-completions",
route_type="llm/v1/completions",
data={
"name": "mpt-7b",
"provider": "databricks-model-serving",
"databricks_model_serving_config": {
"databricks_workspace_url": "https://my.workspace.databricks.com",
"databricks_api_token": $DATABRICKS_ACCESS_TOKEN,
},
}
)
mpt_7b_response = query(
route="oss-mpt-7b-completions",
data={"prompt": "What is MLflow?"}
)
response_text = mpt_7b_response["candidates"][0]["text"]
assert response_text.startswith("MLflow is an open source ML platform")
For more information about Routes, check out the MLflow AI Gateway documentation.
Use the latest and greatest LLMs with a standard interface
To solve business problems and build high-quality applications in a cost-effective way, data analysts and data scientists need to try a variety of SaaS and open source LLMs. Each of these LLMs defines its own request-response format, parameters, and dependencies. Rather than requiring consumers to install specialized software and familiarize themselves with vendor-specific API documentation for each LLM they want to query, the AI Gateway provides a standard REST API for LLM tasks, including chat, completions, and embeddings.
Each Route in the AI Gateway has a type, such as llm/v1/completions for text completions or llm/v1/chat for chat, which determines the request-response format and query parameters. This format is consistent across LLMs from every vendor, enabling data scientists and data analysts to experiment with multiple LLMs and achieve optimal results.
The following code snippet demonstrates this seamless experimentation using the MLflow Python client. By changing a single line, the example code queries two text completions Routes: one for OpenAI GPT-4 and another for Cohere’s Command model.
from mlflow.gateway import set_gateway_uri, create_route, query
set_gateway_uri(gateway_uri="databricks")
# Create a Route for Completions with Cohere
create_route(
name="cohere-completions",
route_type="llm/v1/completions",
data={
"name": "command",
"provider": "cohere",
"cohere_config": {
"cohere_api_key": $COHERE_API_KEY
}
}
)
# Query the OpenAI GPT-4 route (see previous section) and the Cohere Route
openai_gpt4_response = query(
route="gpt-4-completions",
data={"prompt": "What is MLflow?", "temperature": 0.3, "max_tokens": 100}
)
cohere_command_response = query(
route="cohere-completions", # Only the route name changes
data={"prompt": "What is MLflow?", "temperature": 0.3, "max_tokens": 100}
)
For more information about the AI Gateway’s standard interfaces for text completions, chat, and embeddings, check out the MLflow AI Gateway documentation.
Get started with the MLflow AI Gateway
We invite you to secure and accelerate your LLM use cases by trying out the MLflow AI Gateway on Databricks! If you’re an existing Databricks user, contact your Databricks representative to enroll in the AI Gateway Private Preview. If you are not yet a Databricks user, visit databricks.com/product/managed-mlflow to learn more and start a free trial of Databricks and Managed MLflow. Check out the release changelog for more information about the open source MLflow AI Gateway and other features and improvements included in MLflow 2.5.