Mixtral 8x7B model is now supported on xCloud for fine-tuning and inference!

dashboard image

The cloud for generative AI and LLMs

Deploy and scale LLMs in production with accelerated inference to achieve 30x cost savings and 3x lower latency.

Sign up now to receive complimentary fine-tuning for your use-case

Accelerated Inference

Push the model inference to its limits with our hardware and software optimizations for each model. Supported by our robust, production-ready infrastructure featuring auto-scaling, fault recovery and monitoring.

10x
Cheaper than GPT4
7x
Faster than GPT4
30x
Cost saving on average
3x
Latency reduction
Open Source Model APIs

Open Source Model APIs

Effortlessly fine-tune and deploy open-source models like LLaMA-2-70B. Dedicate your focus to the data while we handle the heavy lifting—setting up the infrastructure, executing fine-tuning, and accelerating models

Compatible with OpenAI fine-tuning and supports popular models like LLaMA, Falcon, MPT, GPT and more.

Fine-tuning Flexibility

Fine-tuning Flexibility

You have the freedom to customize every aspect of the fine-tuning of your LLMs. Connect your chosen data source, use your favorite framework, perform data transformations, write custom code.

xCloud seamlessly integrates into your existing workflow. Use it with the web dashboard, CLI or Python SDK.

Security and Privacy

Security and Privacy

Concerned about data privacy or looking to optimize costs with your own hardware?

Deploy xCloud on your preferred public cloud platform (AWS, Azure, or GCP) and rest assured that all your data and models will reside securely within your own VPC.

Unlock new capabilities

Build and deploy LLMs on any number of GPUs

HW Efficient Fine-tuning

HW Efficient Fine-tuning

Choose when the models are retrained to suit your business needs

Model Compression

Model Compression

Based on the application requirements, scale model sizes to achieve the best balance of quality, latency, and cost

Inference Optimization

Inference Optimization

Optimized inference stack to guarantee maximal throughput and minimal latency and cost

Enterprise Security

Enterprise Security

Your data is protected on a private cloud. Only authorized personnel can access your data and your AI

Testimonials

Stochastic's team have been a delight to work with and the xCloud product has been instrumental for us. We at NinjaTech AI have started to use xCloud right alongside AWS SageMaker, Google's Vertex and also Azure's machine learning studio. More specifically, after using xCloud's Automatic Acceleration technology, we were amazed by how it was able to automatically reduce our latency by quite a bit. Most importantly, the support team behind xCloud is superb and we'll continue to value our partnership with them.

Babak P.

CEO, NinjaTech AI Inc.

FAQs

How does xCloud revolutionize Generative AI development?
xCloud makes Generative AI development easier, faster, and more cost-effective by offering features like fine-tuning flexibility, accelerated performance, collaborative tools like Jupyter Notebooks, and performance benchmarking.
What deployment options are available with xCloud?
You can deploy xCloud in Stochastic’s Cloud, in Your Virtual Private Cloud (VPC) on cloud providers like GCP, AWS, or Azure, or On-Premises with a Kubernetes Cluster.
How does xCloud enhance inference performance?
xCloud accelerates inference with a 3x reduction in latency and 30x cost savings compared to standard LLM hosting, supporting features like auto-scaling, dynamic batching, fault recovery, and monitoring.
What are Model APIs in xCloud and how do they work?
Model APIs in xCloud allow effortless fine-tuning and deployment of LLMs. Users need to provide a dataset with prompts and outputs, select a base model, and utilize xCloud’s optimized inference service for deployment.
Which Open Source models are supported in xCloud’s Model APIs?
xCloud supports LLaMA (v2) with various size variants: 7B, 13B, and 70B.
Is fine-tuning of LLMs possible with xCloud?
Yes, xCloud offers fine-tuning flexibility, allowing you to use your preferred framework like PyTorch or TensorFlow and write custom code for model fine-tuning.
How can I experiment with LLMs using xCloud?
You can experiment with LLMs using Jupyter Notebooks provided by xCloud, which offer a dynamic, collaborative environment for faster model development, experimentation, debugging, and documentation.
What is the pricing structure of xCloud?
Pricing varies based on deployment: license-based pricing for VPC or On-Premises deployments and machine type-based pricing for deployments in Stochastic’s Cloud. For detailed pricing, refer to the xCloud Pricing Details.
How can I contact the xCloud support team?
Contact us via email at xcloud-support@stochastic.ai or schedule a demo with us for support.
How do I get started with using xCloud?
Start by exploring the in-depth documentation, visiting the landing page to learn more about xCloud’s offerings, and logging in to the application.