RunPod Blog (Page 4)

RunPod Blog

Sign in Subscribe

When to Use (Or Not Use) RunPod's Proxy

When to Use (Or Not Use) RunPod's Proxy

RunPod uses a proxy system to ensure that you have easy accessibility to your pods without needing to make any configuration changes. This proxy utilizes Cloudflare for ease of both implementation and access, which comes with several benefits and drawbacks. Let's go into a little explainer about specifically

Comparing Different Quantization Methods: Speed Versus Quality Tradeoffs

Comparing Different Quantization Methods: Speed Versus Quality Tradeoffs

Introduction Quantization is a key technique in machine learning that is used to reduce the model size and speed up inference, especially when deploying models on hardware with resource constraints. Nevertheless, achieving a good quantization setup means balancing the model performance against the computational efficiency required by the deployment environment.

Community Spotlight: How to Build and Deploy an AI Chatbot from Scratch on RunPod

Community Spotlight: How to Build and Deploy an AI Chatbot from Scratch on RunPod

In an extremely generous contribution to the RunPod community, our friends at Code in a Jiffy recently shared their journey of building a complete coffee shop application enhanced with artificial intelligence. This comprehensive project showcases how AI can transform everyday commerce applications into intelligent, interactive experiences. The video is 12

Classifier Free Guidance in LLMs - How Does It Work?

Classifier Free Guidance in LLMs - How Does It Work?

Classifier-Free Guidance (CFG) has emerged as a powerful technique for improving the quality and controllability of language model outputs. While initially developed for image generation models, CFG has found successful applications in text generation. Let's dive deep into how this technique works and why it's becoming

Mochi 1 Text-To-Video Represents New SOTA In Open Source Video Gen

Mochi 1 Text-To-Video Represents New SOTA In Open Source Video Gen

Text-to-video generation is a space where open source has lagged behind for some time, due to the difficulty and cost involved in training and evaluating video as opposed to text and images. Offerings such as Sora, while impressive, beg for open-source alternatives where you can create videos of any kind

Stability.ai Releases Stable Diffusion 3.5 - What's New in the Latest Generation?

Stability.ai Releases Stable Diffusion 3.5 - What's New in the Latest Generation?

On October 22, Stability.AI released its latest version of Stable Diffusion, SD3.5 There are currently two versions out (Large and Large Turbo), with the former geared towards quality while the latter favoring efficiency. Next week, Medium will release, aimed at smaller GPU specs. You can quickly and easily

NVidia's Llama 3.1 Nemotron 70b Instruct: Can It Handle My Unsolved LLM Problem?

NVidia's Llama 3.1 Nemotron 70b Instruct: Can It Handle My Unsolved LLM Problem?

Earlier this month, NVidia released Llama 3.1 Nemotron Instruct, a 70b model that has taken some notably high spots on various leaderboards, seeming to punch far above its weight. As of October 14th, it is not only beating high-end closed source models that far outweigh it like Claude 3

How to Code Directly With Stable Diffusion Within Python On RunPod

How to Code Directly With Stable Diffusion Within Python On RunPod

While there are many useful front ends for prompting Stable Diffusion, in some ways it can be easier to simply it directly within Jupyter Notebook, which comes pre-installed within many RunPod templates. Once you spin up a pod you get instant access to Jupyter as well, allowing you to directly

Why LLMs Can't Spell 'Strawberry' And Other Odd Use Cases

Why LLMs Can't Spell 'Strawberry' And Other Odd Use Cases

Picture this: You've got an AI language model - let's call it Bahama-3-70b - who can write sonnets, explain quantum physics, and even crack jokes. But ask it to count the r's in "strawberry," and suddenly it's like a toddler

How to Easily Work with GGUF Quantizations In KoboldCPP

Text Generation

How to Easily Work with GGUF Quantizations In KoboldCPP

Everyone wants more bang for their buck when it comes to their business expenditures, and we want to ensure you have as many options as possible. Although you could certainly load full-weight fp16 models, it turns out that you may not actually need that level of precision, and it may

Introducing Better Launcher: Spin Up New Stable Diffusion Pods Quicker Than Before

Image Generation

Introducing Better Launcher: Spin Up New Stable Diffusion Pods Quicker Than Before

Our very own Madiator2011 has done it again with the release of Better Forge, a streamlined template that lets you spin up an instance with a minimum of fuss. One fairly consistent piece of feedback brought up by RunPod users is how long it takes to start up an image

Use RunPod Serverless To Run Very Large Language Models Securely and Privately

Use RunPod Serverless To Run Very Large Language Models Securely and Privately

As discussed previously, a human interacting with a chatbot is one of the prime use cases for RunPod serverless functions. Because the vast majority of the elapsed time is on the human's end, where they are reading, procesisng, and responding, the GPU sits idle for the vast majority

Evaluate Multiple LLMs Simultaneously in a Flash with ollama

Evaluate Multiple LLMs Simultaneously in a Flash with ollama

Imagine you are a studio manager tasked with serving up a creative writing assistant to your users, and are directed to select only a few best candidates to run on endpoints to keep the project maintainable and within scope. As of the writing of this article, there are more than

Optimize Your vLLM Deployments on RunPod with GuideLLM

Optimize Your vLLM Deployments on RunPod with GuideLLM

As a RunPod user, you're already leveraging the power of GPU cloud computing for your machine learning projects. But are you getting the most out of your vLLM deployments? Enter GuideLLM, a powerful tool that can help you evaluate and optimize your Large Language Model (LLM) deployments for

RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs

RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 📈 Pricing Updates We've been running a temporary promotion for A40 48GB GPUs, known for their exceptional combination of vRAM, performance, and pricing. We've been thrilled to see the amazing products

Run Gemma 7b with vLLM on RunPod Serverless

Run Gemma 7b with vLLM on RunPod Serverless

In this blog, you'll learn: * About RunPod's latest vLLM worker for the newest models * Why vLLM is an excellent choice for running Google’s Gemma 7B * A step-by-step guide to get Google Gemma 7B up and running on RunPod Serverless with the quick deploy vLLM worker.

Run Llama 3.1 with vLLM on RunPod Serverless

Run Llama 3.1 with vLLM on RunPod Serverless

In this blog, you'll learn: * About RunPod's latest vLLM worker for the newest models * Why vLLM is an excellent choice for running Meta's Llama 3.1 * A step-by-step guide to get Meta Llama 3.1's 8b-instruct version up and running on RunPod

RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs

RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: ✨ Serverless Improvements Our workers view has been revamped to give a more in-depth overview of each worker, where it's located, and it's current state. You can now also expose HTTP

Supercharge Your LLMs Using SGLang For Inference: Why Speed and Efficiency Matter More Than Ever

Supercharge Your LLMs Using SGLang For Inference: Why Speed and Efficiency Matter More Than Ever

RunPod is proud to partner with LMSys once again to put a spotlight on its inference engine SGLang. LMSys has a storied history within the realm of language models with prior contributions such as the Chatbot Arena which compares outputs from competing models, Vicuna, an open source competitor to ChatGPT,

How to Run Flux Image Generator with ComfyUI

How to Run Flux Image Generator with ComfyUI

What is Flux? Flux is an innovative text-to-image AI model developed by Black Forest Labs that has quickly gained popularity among generative AI enthusiasts and digital artists. Its ability to generate high-quality images from simple text prompts sets it apart. The Flux 1 family includes three versions of their image

How to run Flux image generator with RunPod

How to run Flux image generator with RunPod

What is Flux? Flux is a new and exciting text-to-image AI model developed by Black Forest Labs. This innovative model family has quickly captured the attention of generative AI enthusiasts and digital artists alike, thanks to its remarkable ability to generate high-quality images from simple text prompts. The Flux 1family

RunPod Weekly #15 - New Referral Program, Community Changelog, Blogs

RunPod Weekly #15 - New Referral Program, Community Changelog, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 🤝 New Referral Program We've reworked our referral program to make it easier (and more lucrative) for anyone to get started. These changes include higher reward rates, a new serverless referral program, no

How to run SAM 2 on a cloud GPU with RunPod

How to run SAM 2 on a cloud GPU with RunPod

What is SAM 2? Meta has unveiled Segment Anything Model 2 (SAM 2), a revolutionary advancement in object segmentation. Building on the success of its predecessor, SAM 2 integrates real-time, promptable object segmentation for both images and videos, enhancing accuracy and speed. Its ability to operate across previously unseen visual

Run Llama 3.1 405B with Ollama: A Step-by-Step Guide

Run Llama 3.1 405B with Ollama: A Step-by-Step Guide

Meta’s recent release of the Llama 3.1 405B model has made waves in the AI community. This groundbreaking open-source model not only matches but even surpasses the performance of leading closed-source models. With impressive scores on reasoning tasks (96.9 on ARC Challenge and 96.8 on GSM8K)

Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

In many sports – golf, baseball, tennis, among others – there is a "sweet spot" to aim for which results in the maximum amount of lift or distance for the ball given an equivalent amount of kinetic energy in the swing. While you'll still get somewhere with an