With the release of VMware Cloud Foundation (VCF) 9.0, VMware has introduced powerful Private AI Services (PAIS). This new suite of tools delivers a complete platform for enterprises that want to run Private AI on-premises while still enjoying a cloud-like experience.
In this blog, we’ll explore the new features of VMware Private AI Services, walking through the large language model (LLM) lifecycle—from model selection to building Retrieval Augmented Generation (RAG) applications.
Model Selection in VMware Private AI Services #
The lifecycle begins with choosing an LLM. Data scientists and LLM Ops engineers can select from public models like Meta’s Llama 3 family or NVIDIA GPU Cloud–optimized models.
Once a model is chosen, it must be tested against company-specific data for:
- Accuracy
- Bias handling
- Security vulnerabilities
- Performance metrics
To streamline this process, VCF automates testing environments. Engineers can create a Deep Learning Virtual Machine (DLVM) directly from the AI Workstation menu in VCF. The DLVM comes preloaded with the necessary tools for safe and efficient model validation.
Model Store: Centralized AI Model Management #
After validation, tested models can be stored in a Harbor-based Model Store for enterprise-wide reuse.
- LLM Ops engineers configure the store and control access.
- Models are pushed from the DLVM to the store using the “pais” CLI tool included with Private AI.
This ensures organizations maintain model governance, version control, and secure accessibility.
Deploying and Accessing Models #
Application developers and test engineers can access models to power applications like chatbots.
Key features include:
- Model Endpoints – Each deployed model gets a URL and API for integration.
- Multiple Inference Engines – Choose from vLLM (for completion models) or Infinity (for embedding models).
- GPU-Enabled Deployments – Admins predefine GPU configurations using VMClasses. Models run inside Kubernetes Pods with inference engines, load-balanced via an API Gateway that handles authentication and authorization.
This setup enables scalable, high-performance AI workloads across enterprise environments.
Data Indexing Service for RAG Applications #
A major highlight of VCF 9.0 Private AI Services is its Data Indexing and Retrieval Service, critical for RAG (Retrieval Augmented Generation) designs.
With PAIS, engineers can:
- Connect to data sources like Google Drive, Confluence, SharePoint, or S3.
- Create a Knowledge Base for indexing enterprise data.
- Use embedding models to chunk, index, and store data in a vector database for semantic search.
This ensures models can answer questions with context-aware, company-specific knowledge.
Agent Builder: Designing RAG Applications #
The Private AI Agent Builder is VMware’s no-code/low-code solution for creating RAG-powered AI applications.
How it works:
- User queries are converted into embeddings.
- The query is sent to the vector database.
- Relevant data is retrieved.
- The LLM generates accurate, context-driven responses.
Developers can test agents in real time, automate quality checks with CI/CD pipelines, and roll out model upgrades without downtime. This makes it easy to build enterprise-ready customer service, knowledge management, and analytics applications.
Key Features of VMware Private AI Services in VCF 9.0 #
VMware Private AI Services bring together a comprehensive toolset for end-to-end AI deployment:
- Model Store – Secure, version-controlled model management.
- Model Publishing – Deploy inference engines like vLLM and Infinity with API endpoints.
- Data Indexing & Retrieval – Knowledge Base creation with embeddings and vector databases.
- Agent Builder – Rapid design of RAG applications for real-world use cases.
Why VMware Private AI Services Matter #
By integrating NVIDIA-powered AI with on-premises control, VMware VCF 9.0 Private AI Services enable enterprises to:
- Maintain data sovereignty while using private data.
- Deploy AI workloads efficiently without depending solely on public clouds.
- Support GenAI and LLM applications securely at scale.
Final Thoughts #
The new VMware Private AI Services in VCF 9.0 deliver a cloud-like experience for on-premises AI, making it easier than ever for data scientists, LLM Ops engineers, and developers to build secure, scalable, and enterprise-ready AI applications.
As AI adoption accelerates, VMware’s Private AI Services provide the foundation for organizations seeking to harness the power of LLMs and RAG applications while keeping control of their data, infrastructure, and compliance requirements.