Introduction

Features

  • Simple syntax, looks and feels like PyTorch.
  • Backends.
    • Optimized CPU backend with optional MKL support for x86 and Accelerate for macs.
    • CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL.
    • WASM support, run your models in a browser.
  • Included models.
    • Language Models.
      • LLaMA v1, v2, and v3 with variants such as SOLAR-10.7B.
      • Falcon.
      • StarCoder, StarCoder2.
      • Phi 1, 1.5, 2, and 3.
      • Mamba, Minimal Mamba
      • Gemma v1 2b and 7b+, v2 2b and 9b.
      • Mistral 7b v0.1.
      • Mixtral 8x7b v0.1.
      • StableLM-3B-4E1T, StableLM-2-1.6B, Stable-Code-3B.
      • Replit-code-v1.5-3B.
      • Bert.
      • Yi-6B and Yi-34B.
      • Qwen1.5, Qwen1.5 MoE.
      • RWKV v5 and v6.
    • Quantized LLMs.
      • Llama 7b, 13b, 70b, as well as the chat and code variants.
      • Mistral 7b, and 7b instruct.
      • Mixtral 8x7b.
      • Zephyr 7b a and b (Mistral-7b based).
      • OpenChat 3.5 (Mistral-7b based).
    • Text to text.
      • T5 and its variants: FlanT5, UL2, MADLAD400 (translation), CoEdit (Grammar correction).
      • Marian MT (Machine Translation).
    • Text to image.
      • Stable Diffusion v1.5, v2.1, XL v1.0.
      • Wurstchen v2.
    • Image to text.
      • BLIP.
      • TrOCR.
    • Audio.
      • Whisper, multi-lingual speech-to-text.
      • EnCodec, audio compression model.
      • MetaVoice-1B, text-to-speech model.
      • Parler-TTS, text-to-speech model.
    • Computer Vision Models.
      • DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT, ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera, FastViT.
      • yolo-v3, yolo-v8.
      • Segment-Anything Model (SAM).
      • SegFormer.
  • File formats: load models from safetensors, npz, ggml, or PyTorch files.
  • Serverless (on CPU), small and fast deployments.
  • Quantization support using the llama.cpp quantized types.

This book will introduce step by step how to use candle.