Storage Locations in Synapse¶
Storage locations allow you to configure where files uploaded to Synapse are stored. By default, files are stored in Synapse's internal S3 storage, but you can configure projects or folders to use your own AWS S3 buckets, Google Cloud Storage buckets, or other external storage.
This tutorial demonstrates how to use the Python client to manage storage locations using the new object-oriented models.
Read more about Custom Storage Locations
Tutorial Purpose¶
In this tutorial you will:
- Create an external S3 storage location and assign it to a folder
- Create a Google Cloud Storage location and assign it to a folder
- Create an SFTP storage location and assign it to a folder
- Create an HTTPS storage location and assign it to a folder
- Create an External Object Store location and assign it to a folder
- Create a Proxy storage location, register a proxy file handle, and assign it to a folder
- Retrieve and inspect storage location settings
- Update a storage location (create a replacement and reassign)
- Index and migrate files to a new storage location
Prerequisites¶
- Make sure that you have completed the Installation and Authentication setup.
- You must have a Project created and replace the one used in this tutorial.
- An AWS S3 bucket properly configured for use with Synapse, including an
owner.txtfile. See Custom Storage Locations. - (Optional)
boto3installed for STS credential examples. - For SFTP:
pysftpinstalled (pip install "synapseclient[pysftp]"). - For Object Store: AWS credentials configured in your environment.
- For Proxy: a running proxy server and its shared secret key.
Understanding Storage Location Types¶
Synapse supports several types of storage locations:
- SYNAPSE_S3: Synapse-managed S3 storage (default)
- EXTERNAL_S3: User-owned AWS S3 bucket, accessed by Synapse on
your behalf. Synapse transfers the data for uploads and downloads. Requires
an
owner.txtfile in the bucket to verify ownership. - EXTERNAL_GOOGLE_CLOUD: User-owned Google Cloud Storage bucket
- EXTERNAL_SFTP: External SFTP server
- EXTERNAL_HTTPS: External HTTPS server (uploading via client is not supported right now.)
- EXTERNAL_OBJECT_STORE: An S3-compatible store (e.g., MinIO, OpenStack Swift) that Synapse does not access. The client transfers data directly to the object store using credentials configured in your environment; Synapse only stores the file metadata.
- PROXY: A proxy server that controls access to the underlying storage
Storage Location Settings¶
Each storage type exposes a different set of configuration fields on
StorageLocation. When you retrieve a stored location, only the fields
relevant to its type are populated:
| Type | Key fields |
|---|---|
SYNAPSE_S3 |
base_key, sts_enabled |
EXTERNAL_S3 |
bucket, base_key, sts_enabled, endpoint_url |
EXTERNAL_GOOGLE_CLOUD |
bucket, base_key |
EXTERNAL_SFTP / EXTERNAL_HTTPS |
url, supports_subfolders |
EXTERNAL_OBJECT_STORE |
bucket, endpoint_url |
PROXY |
proxy_url, secret_key, benefactor_id |
Common attributes are: concrete_type, storage_location_id, storage_type, upload_type, banner, description, etag, created_on, created_by
1. Set up and get project¶
import os
import synapseclient
from synapseclient.models import Folder, Project, StorageLocation, StorageLocationType
syn = synapseclient.login()
# Step 1: Retrieve the project
my_project = Project(name="My uniquely named project about Alzheimer's Disease").get()
2. Create an external S3 storage location¶
Create a storage location backed by your own S3 bucket. The bucket must be
properly configured with an owner.txt file. Synapse will transfer data
directly to and from this bucket on the user's behalf.
MY_BUCKET_NAME = "my-synapse-bucket"
MY_BASE_KEY = "synapse-data"
external_s3_storage_location = StorageLocation(
storage_type=StorageLocationType.EXTERNAL_S3,
bucket=MY_BUCKET_NAME,
base_key=MY_BASE_KEY,
description="External S3 storage location",
).store()
print(f"Created storage location: {external_s3_storage_location.storage_location_id}")
print(f"storage location type: {external_s3_storage_location.storage_type}")
You'll notice the output looks like:
Created storage location: 12345
storage location type: StorageLocationType.EXTERNAL_S3
3. Set up a folder with external S3 storage¶
Create a folder and assign it the S3 storage location. All files uploaded into this folder will be stored in your S3 bucket.
external_s3_folder = Folder(name="my-folder-for-external-s3", parent_id=my_project.id)
external_s3_folder = external_s3_folder.store()
# Set the storage location for the folder
external_s3_folder.set_storage_location(
storage_location_id=external_s3_storage_location.storage_location_id
)
You'll notice the output looks like:
ProjectSetting(id=..., project_id=..., settings_type='upload', locations=[...], concrete_type='org.sagebionetworks.repo.model.project.UploadDestinationListSetting', etag='...')
4. Create a Google Cloud Storage location¶
Create a storage location backed by a Google Cloud Storage bucket and assign it to a folder.
MY_GCS_BUCKET = "my-gcs-bucket"
MY_GCS_BASE_KEY = "synapse-data"
gcs_storage = StorageLocation(
storage_type=StorageLocationType.EXTERNAL_GOOGLE_CLOUD,
bucket=MY_GCS_BUCKET,
base_key=MY_GCS_BASE_KEY,
description="External Google Cloud Storage location",
).store()
print(f"Created GCS storage location: {gcs_storage.storage_location_id}")
print(f"storage location type: {gcs_storage.storage_type}")
gcs_folder = Folder(name="my-folder-for-gcs", parent_id=my_project.id)
gcs_folder = gcs_folder.store()
# Set the storage location for the folder
gcs_folder.set_storage_location(storage_location_id=gcs_storage.storage_location_id)
5. Create an SFTP storage location¶
SFTP storage locations point to an external SFTP server, where files are stored outside of Synapse. Synapse only manages the metadata and does not handle the file transfer itself. This setup requires the pysftp package, and files must be uploaded separately through the client once configured.
MY_SFTP_URL = "sftp://your-sftp-server.example.com/upload"
sftp_storage = StorageLocation(
storage_type=StorageLocationType.EXTERNAL_SFTP,
url=MY_SFTP_URL,
supports_subfolders=True,
description="External SFTP server",
).store()
print(f"Created SFTP storage location: {sftp_storage.storage_location_id}")
print(f"storage location type: {sftp_storage.storage_type}")
sftp_folder = Folder(name="my-folder-for-sftp", parent_id=my_project.id)
sftp_folder = sftp_folder.store()
# Set the storage location for the folder
sftp_folder.set_storage_location(storage_location_id=sftp_storage.storage_location_id)
6. Create an HTTPS storage location¶
EXTERNAL_HTTPS uses the same underlying API type as EXTERNAL_SFTP but is
used when the external server is accessed over HTTPS. Note that the Python
client does NOT support uploading files to HTTPS storage locations directly yet. To add files, use the Synapse REST API directly.
MY_HTTPS_URL = "https://my-https-server.example.com"
https_storage = StorageLocation(
storage_type=StorageLocationType.EXTERNAL_HTTPS,
url=MY_HTTPS_URL,
description="External HTTPS server",
).store()
print(f"Created HTTPS storage location: {https_storage.storage_location_id}")
print(f"storage location type: {https_storage.storage_type}")
my_https_folder = Folder(name="my-folder-for-https", parent_id=my_project.id)
my_https_folder = my_https_folder.store()
# Set the storage location for the folder
my_https_folder.set_storage_location(
storage_location_id=https_storage.storage_location_id
)
7. Create an External Object Store storage location¶
Use EXTERNAL_OBJECT_STORE for S3-compatible stores that are not directly
accessed by Synapse. Unlike EXTERNAL_S3, the Python client transfers data
directly to the object store using locally configured AWS credentials —
Synapse is never involved in the data transfer, only in storing the metadata.
Configure your AWS credentials using any method supported by the AWS SDK
(environment variables, ~/.aws/credentials, IAM roles, etc.). See the
AWS documentation on credential configuration
for details.
Once credentials are configured, add a matching profile section to ~/.synapseConfig
so the client knows which profile to use for a given endpoint and bucket:
[https://s3.us-east-1.amazonaws.com/test-external-object-store]
profile_name = my-s3-profile
MY_OBJECT_STORE_BUCKET = "test-external-object-store"
MY_OBJECT_STORE_ENDPOINT_URL = "https://s3.us-east-1.amazonaws.com"
object_store_storage = StorageLocation(
storage_type=StorageLocationType.EXTERNAL_OBJECT_STORE,
bucket=MY_OBJECT_STORE_BUCKET,
endpoint_url=MY_OBJECT_STORE_ENDPOINT_URL,
description="External S3-compatible object store",
).store()
print(f"Created object store location: {object_store_storage.storage_location_id}")
print(f"storage location type: {object_store_storage.storage_type}")
# create a folder with the object store storage location
object_store_folder = Folder(name="my-folder-for-object-store", parent_id=my_project.id)
object_store_folder = object_store_folder.store()
# Set the storage location for the folder
object_store_folder.set_storage_location(
storage_location_id=object_store_storage.storage_location_id
)
8. Create a Proxy storage location¶
Proxy storage locations delegate file access to a proxy server that controls
authentication and access to the underlying storage. Files are registered by
creating a ProxyFileHandle via the REST API. Then, files can be uploaded via store function with data_file_handle_id.
# Replace with your proxy server URL and provide the shared secret key via the
# MY_PROXY_SECRET_KEY environment variable.
MY_PROXY_URL = "https://my-proxy-server.example.com"
MY_PROXY_SECRET_KEY = os.environ.get("MY_PROXY_SECRET_KEY")
proxy_storage = StorageLocation(
storage_type=StorageLocationType.PROXY,
proxy_url=MY_PROXY_URL,
secret_key=MY_PROXY_SECRET_KEY,
benefactor_id=my_project.id,
description="Proxy-controlled storage",
).store()
print(f"Created proxy storage location: {proxy_storage.storage_location_id}")
print(f" Proxy URL: {proxy_storage.proxy_url}")
print(f" Benefactor ID: {proxy_storage.benefactor_id}")
my_proxy_folder = Folder(name="my-folder-for-proxy", parent_id=my_project.id)
my_proxy_folder = my_proxy_folder.store()
# Set the storage location for the folder
my_proxy_folder.set_storage_location(
storage_location_id=proxy_storage.storage_location_id
)
9. Retrieve and inspect storage location settings¶
You can retrieve a storage location by ID. Only fields relevant to the storage type are populated.
retrieved_storage = StorageLocation(
storage_location_id=external_s3_storage_location.storage_location_id
).get()
print(f"Retrieved storage location ID: {retrieved_storage.storage_location_id}")
print(f"Storage type: {retrieved_storage.storage_type}")
print(f"Bucket: {retrieved_storage.bucket}")
print(f"Base key: {retrieved_storage.base_key}")
You'll notice the output looks like:
Retrieved storage location ID: 12345
Storage type: StorageLocationType.EXTERNAL_S3
Bucket: my-synapse-bucket
Base key: synapse-data
10. Update a storage location¶
Storage locations are immutable — individual fields cannot be edited after creation. To "update" a storage location, create a new one with the desired settings and reassign it to the folder or project.
# Example: change the base key of the External S3 storage location used by
# external_s3_folder from MY_BASE_KEY to "synapse-data-v2".
updated_s3_storage_location = StorageLocation(
storage_type=StorageLocationType.EXTERNAL_S3,
bucket=MY_BUCKET_NAME,
base_key="synapse-data-v2",
description="External S3 storage location (updated base key)",
).store()
print(f"New storage location ID: {updated_s3_storage_location.storage_location_id}")
# Reassign the folder to point at the new storage location
external_s3_folder.set_storage_location(
storage_location_id=updated_s3_storage_location.storage_location_id
)
updated_folder_setting = external_s3_folder.get_project_setting()
print(
f"Folder now uses the updated storage location: {updated_s3_storage_location.storage_location_id}"
)
# Step 10b: Partial update — add a storage location without removing existing ones
#
# `set_storage_location` is a destructive replacement. To append a new location
# while keeping the ones already configured, read the current ProjectSetting,
# append to its `locations` list, and call store() on the setting directly.
setting = external_s3_folder.get_project_setting()
if setting is not None:
setting.locations.append(gcs_storage.storage_location_id)
setting.store()
print(f"Updated locations after partial update: {setting.locations}")
References used in this tutorial¶
- StorageLocation
- StorageLocationType
- Folder
- File
- Project
- syn.login
- Custom Storage Locations Documentation
See also¶
- Storage Location Architecture - In-depth architecture diagrams and design documentation