Introduction

Features

Simple syntax, looks and feels like PyTorch.
- Model training.
- Embed user-defined ops/kernels, such as flash-attention v2.
Backends.
- Optimized CPU backend with optional MKL support for x86 and Accelerate for macs.
- CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL.
- WASM support, run your models in a browser.
Included models.
- Language Models.
  - LLaMA v1, v2, and v3 with variants such as SOLAR-10.7B.
  - Falcon.
  - StarCoder, StarCoder2.
  - Phi 1, 1.5, 2, and 3.
  - Mamba, Minimal Mamba
  - Gemma v1 2b and 7b+, v2 2b and 9b.
  - Mistral 7b v0.1.
  - Mixtral 8x7b v0.1.
  - StableLM-3B-4E1T, StableLM-2-1.6B, Stable-Code-3B.
  - Replit-code-v1.5-3B.
  - Bert.
  - Yi-6B and Yi-34B.
  - Qwen1.5, Qwen1.5 MoE.
  - RWKV v5 and v6.
- Quantized LLMs.
  - Llama 7b, 13b, 70b, as well as the chat and code variants.
  - Mistral 7b, and 7b instruct.
  - Mixtral 8x7b.
  - Zephyr 7b a and b (Mistral-7b based).
  - OpenChat 3.5 (Mistral-7b based).
- Text to text.
  - T5 and its variants: FlanT5, UL2, MADLAD400 (translation), CoEdit (Grammar correction).
  - Marian MT (Machine Translation).
- Text to image.
  - Stable Diffusion v1.5, v2.1, XL v1.0.
  - Wurstchen v2.
- Image to text.
  - BLIP.
  - TrOCR.
- Audio.
  - Whisper, multi-lingual speech-to-text.
  - EnCodec, audio compression model.
  - MetaVoice-1B, text-to-speech model.
  - Parler-TTS, text-to-speech model.
- Computer Vision Models.
  - DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT, ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera, FastViT.
  - yolo-v3, yolo-v8.
  - Segment-Anything Model (SAM).
  - SegFormer.
File formats: load models from safetensors, npz, ggml, or PyTorch files.
Serverless (on CPU), small and fast deployments.
Quantization support using the llama.cpp quantized types.

This book will introduce step by step how to use candle.

Candle Documentation

Introduction

Features