Privacy Mode

Privacy Mode lets customers use Nucleus without sensitive raw data ever leaving their servers. With Privacy Mode, you can submit URLs to Nucleus that link to raw data assets like images or point clouds, instead of transferring that data to Scale. Access control is then completely in the hands of users: URLs may optionally be protected behind your corporate VPN or an IP whitelist. When you load a Nucleus web page, your browser will directly fetch the raw data from your servers without it ever being accessible to Scale.

Privacy mode is enabled on the Dataset level, and all underlying items will be considered to be in privacy mode.

Enabling Privacy Mode

Simply set use_privacy_mode=True on dataset creation

import nucleus
client = nucleus.NucleusClient(API_KEY)

private_dataset = client.create_dataset(
  name="my dataset",
  use_privacy_mode=True,
)

On Video Datasets

For privacy mode in Video datasets, you must provide a video_location, frame_rate, and an array items of DatasetItems with urls to each frame of your video.

from nucleus import VideoScene, DatasetItem

frame_urls = [
    "https://link_to_url_of_frame_0/that/only/i/can_access.jpeg",
    ...
    "https://link_to_url_of_frame_N/that/only/i/can_access.jpeg",
]
reference_ids = ["video-1-frame-0", ..., "video-1-frame-N"]

dataset_items = []
for url, ref_id in zip(frame_urls, reference_ids):
    item = DatasetItem(image_location=url, reference_id=ref_id)
    dataset_items.append(item)

scene = VideoScene(
    reference_id="video-1",
    video_location="https://link_to_url_of_video/that/only/i/can_access.mp4",
    frame_rate=5,
    items=dataset_items,
)

Model Embeddings and Privacy Mode

Certain Nucleus features, like similarity search and Autotag, depend on having model embeddings for the data in your Nucleus datasets. To support these features in conjunction with Privacy Mode, Nucleus offers two options:

Custom embedding upload: You provide model embeddings for your DatasetItems. In this case, Scale never needs access to your raw data. For details on how to do this see here.
One-time Scale embedding generation: We use our pretrained models to generate embeddings on your data once, then ensure your raw data is permanently deleted from Scale’s servers, and set items to be in Privacy Mode.

In both cases, Scale never retains raw customer data. Scale will only store and index metadata—labels, model predictions and optional metadata attributes that you upload—while avoiding sensitive raw data. Note: we do not currently support scene-level embeddings (as opposed to frame image-level embeddings, which are supported).

If after using Nucleus, you have identified a slice of data that you would like to send for labeling, you can use the update=True feature in Dataset.append to update your items from private to shared with Scale.

Sending Privacy Mode Data for Labeling

In order to send your Privacy Mode data for labeling, you will need to upload it to Scale servers. You can update your existing Privacy Mode DatasetItems such that they are uploaded to Scale as follows:

from nucleus import DatasetItem, NucleusClient

my_private_image = DatasetItem(
  # assume the below link is now updated such that Scale has access
  image_location="https://link_to_url/that/only/i/can_access.jpeg",
  # this reference_id already exists
  reference_id="image1",
  upload_to_scale=True
)

dataset = NucleusClient("YOUR_SCALE_API_KEY").get_dataset("YOUR_DATASET_ID")

dataset.append([my_private_image], update=True)