Dataset.create_custom_index( embeddings_urls: List[str], embedding_dim: int ) -> AsyncJob
This endpoint allows users to upload custom image embedding vectors to use as the feature space for Autotag and similarity search.
A well-formed embeddings file must contain an entry for every image in the dataset, and all embeddings must have the same dimensions. A good embedding is a strong semantic descriptor for an image, usually a pre-activation output of a neural network layer.
Formatting an Embedding File
The files must be in JSON format, and each file should contain no more than 5,000 embedding vectors.
The JSON objects are expected to have the following format: each key is a string reference_id
uniquely identifying a DatasetItem, and each value is a list of floats. Numpy arrays must be converted to lists in order to serialize the JSON file.
The vectors must all have the same dimension/length.
The entries of the JSON object must be exhaustive, meaning that every Dataset Item in the dataset is represented in the keys. This implies that if your Dataset contains 5,000 images, the JSON object represented across the embedding files are also expected to have length 5,000.