Privacy Mode
Privacy Mode lets customers use Nucleus without sensitive raw data ever leaving their servers. With Privacy Mode, you can submit URLs to Nucleus that link to raw data assets like images or point clouds, instead of transferring that data to Scale. Access control is then completely in the hands of users: URLs may optionally be protected behind your corporate VPN or an IP whitelist. When you load a Nucleus web page, your browser will directly fetch the raw data from your servers without it ever being accessible to Scale.
Privacy mode is enabled on the Dataset level, and all underlying items will be considered to be in privacy mode.
Enabling Privacy Mode
Simply set use_privacy_mode=True
on dataset creation
import nucleus
client = nucleus.NucleusClient(API_KEY)
private_dataset = client.create_dataset(
name="my dataset",
use_privacy_mode=True,
)
On Video Datasets
For privacy mode in Video datasets, you must provide a video_location
, frame_rate
, and an array items
of DatasetItems
with urls to each frame of your video.
from nucleus import VideoScene, DatasetItem
frame_urls = [
"https://link_to_url_of_frame_0/that/only/i/can_access.jpeg",
...
"https://link_to_url_of_frame_N/that/only/i/can_access.jpeg",
]
reference_ids = ["video-1-frame-0", ..., "video-1-frame-N"]
dataset_items = []
for url, ref_id in zip(frame_urls, reference_ids):
item = DatasetItem(image_location=url, reference_id=ref_id)
dataset_items.append(item)
scene = VideoScene(
reference_id="video-1",
video_location="https://link_to_url_of_video/that/only/i/can_access.mp4",
frame_rate=5,
items=dataset_items,
)
Model Embeddings and Privacy Mode
Certain Nucleus features, like similarity search and Autotag, depend on having model embeddings for the data in your Nucleus datasets. To support these features in conjunction with Privacy Mode, Nucleus offers two options:
- Custom embedding upload: You provide model embeddings for your
DatasetItems
. In this case, Scale never needs access to your raw data. For details on how to do this see here. - One-time Scale embedding generation: We use our pretrained models to generate embeddings on your data once, then ensure your raw data is permanently deleted from Scale’s servers, and set items to be in Privacy Mode.
In both cases, Scale never retains raw customer data. Scale will only store and index metadata—labels, model predictions and optional metadata attributes that you upload—while avoiding sensitive raw data. Note: we do not currently support scene-level embeddings (as opposed to frame image-level embeddings, which are supported).
If after using Nucleus, you have identified a slice of data that you would like to send for labeling, you can use the update=True
feature in Dataset.append
to update your items from private to shared with Scale.
Sending Privacy Mode Data for Labeling
In order to send your Privacy Mode data for labeling, you will need to upload it to Scale servers. You can update your existing Privacy Mode DatasetItems
such that they are uploaded to Scale as follows:
from nucleus import DatasetItem, NucleusClient
my_private_image = DatasetItem(
# assume the below link is now updated such that Scale has access
image_location="https://link_to_url/that/only/i/can_access.jpeg",
# this reference_id already exists
reference_id="image1",
upload_to_scale=True
)
dataset = NucleusClient("YOUR_SCALE_API_KEY").get_dataset("YOUR_DATASET_ID")
dataset.append([my_private_image], update=True)
Updated about 1 year ago