Training
Training starts with data. We're going to use the huggingface hub and start with the Hello world dataset of machine learning, MNIST.
Let's start with downloading MNIST
from huggingface.
This requires hf-hub
.
cargo add hf-hub
This is going to be very hands-on for now.
This uses the standardized parquet
files from the refs/convert/parquet
branch on every dataset.
Our handles are now [parquet::file::serialized_reader::SerializedFileReader
].
We can inspect the content of the files with:
You should see something like:
Column id 1, name label, value 6
Column id 0, name image, value {bytes: [137, ....]
Column id 1, name label, value 8
Column id 0, name image, value {bytes: [137, ....]
So each row contains 2 columns (image, label) with image being saved as bytes. Let's put them into a useful struct.