Export Embeddings

Use the Dataset.export_embeddings method to export the embeddings vectors of the primary search index of a Dataset. The export_embeddings method starts an asynchronous job, which uploads the embedding vectors as JSON files in batches to a Scale blob storage, from where they can be downloaded.

The result of the export_embeddings method is an EmbeddingsExportJob class, from where the status and the result of the job can be monitored.

import nucleus

client = nucleus.NucleusClient("YOUR_SCALE_API_KEY")
dataset = client.get_dataset("YOUR_DATASET_ID")
embeddings_job = dataset.export_embeddings(asynchronous=True)

result_urls = embeddings_job.result_urls(wait_for_completion=True)

print(result_urls) #['https://scale-temp.s3.dualstack.us-west-2.amazonaws.com/dataset_id_embedding_1.json', ...]