Introduction
Features
- Simple syntax, looks and feels like PyTorch.
- Model training.
 - Embed user-defined ops/kernels, such as flash-attention v2.
 
 - Backends.
- Optimized CPU backend with optional MKL support for x86 and Accelerate for macs.
 - CUDA backend for efficiently running on GPUs, multiple GPU distribution via NCCL.
 - WASM support, run your models in a browser.
 
 - Included models.
- Language Models.
- LLaMA v1, v2, and v3 with variants such as SOLAR-10.7B.
 - Falcon.
 - StarCoder, StarCoder2.
 - Phi 1, 1.5, 2, and 3.
 - Mamba, Minimal Mamba
 - Gemma v1 2b and 7b+, v2 2b and 9b.
 - Mistral 7b v0.1.
 - Mixtral 8x7b v0.1.
 - StableLM-3B-4E1T, StableLM-2-1.6B, Stable-Code-3B.
 - Replit-code-v1.5-3B.
 - Bert.
 - Yi-6B and Yi-34B.
 - Qwen1.5, Qwen1.5 MoE.
 - RWKV v5 and v6.
 
 - Quantized LLMs.
- Llama 7b, 13b, 70b, as well as the chat and code variants.
 - Mistral 7b, and 7b instruct.
 - Mixtral 8x7b.
 - Zephyr 7b a and b (Mistral-7b based).
 - OpenChat 3.5 (Mistral-7b based).
 
 - Text to text.
- T5 and its variants: FlanT5, UL2, MADLAD400 (translation), CoEdit (Grammar correction).
 - Marian MT (Machine Translation).
 
 - Text to image.
- Stable Diffusion v1.5, v2.1, XL v1.0.
 - Wurstchen v2.
 
 - Image to text.
- BLIP.
 - TrOCR.
 
 - Audio.
- Whisper, multi-lingual speech-to-text.
 - EnCodec, audio compression model.
 - MetaVoice-1B, text-to-speech model.
 - Parler-TTS, text-to-speech model.
 
 - Computer Vision Models.
- DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT, ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera, FastViT.
 - yolo-v3, yolo-v8.
 - Segment-Anything Model (SAM).
 - SegFormer.
 
 
 - Language Models.
 - File formats: load models from safetensors, npz, ggml, or PyTorch files.
 - Serverless (on CPU), small and fast deployments.
 - Quantization support using the llama.cpp quantized types.
 
This book will introduce step by step how to use candle.