Skip to main content

Data Orchestration

Curvine provides a UFS (Unified File System) view to manage all supported distributed storage systems, including S3, HDFS, and others.

Mounting​

Curvine supports connecting multiple UFS sources by mounting them to different Curvine paths. Curvine does not provide default UFS configuration, which means if you want to load data from UFS, you must first mount the UFS source.

mount-arch

Curvine persists the mount table in metadata, so there is no need to remount when Curvine restarts. However, some rules must be followed.

  • Mounting to the root path is not allowed.
  • Mounting other UFS under a mounted path is not allowed.
  • The same mount path cannot be mounted to different UFS.

Mount command:

bin/cv mount <UFS_PATH> <CV_PATH> [OPTIONS]
  • ufs_path: UFS path, e.g., s3://bucket/path.
  • curvine_path: Curvine path, e.g., /ufs/path.
  • options: Optional parameters such as --config, --provider, --ttl-ms, and --write-type.

Example:

bin/cv mount s3://ai/xuen-test /s3 \
--config s3.endpoint_url=http://hostname.com \
--config s3.region_name=cn \
--config s3.credentials.access=access_key \
--config s3.credentials.secret=secret_key \
--config s3.path_style=true
tip

You can use command line, API to access ufs directories and files after UFS is mounted, but ufs data will not be automatically synchronized to curvine unless you load specific paths.

Mounting Parameters​

ParameterTypeDefaultDescriptionExample
--config <key=value>repeatednoneUFS backend parameters--config s3.endpoint_url=http://...
--updateboolfalseUpdate an existing mount--update
--check-path-consistbooltrueRequire the UFS path and Curvine path to map consistently--check-path-consist=false
--read-verify-ufsboolfalseCompare cache reads against UFS metadata (mtime / len)--read-verify-ufs
--ttl-msduration7dTTL for the mount24h, 7d, 30d
--replicasintinheritedReplica count override3
--block-sizesizeinheritedBlock size override64MB, 128MB, 256MB
--storage-typeenuminheritedStorage medium typemem / ssd / disk
--write-typeenumfs_modeMount write behaviorcache_mode / fs_mode
--providerenumautoForce the backend implementationauto / oss-hdfs / opendal

Mount Modes​

Write Type​

The current main branch exposes only two write types:

ModeBehaviorTypical use case
cache_modeWrite through to the underlying storage path and use Curvine mainly as a unified access / cached-read layerData that primarily lives in UFS
fs_modeWrite into the Curvine namespace first; first mount of an fs_mode path can trigger metadata resyncCurvine-managed cached filesystem view over mounted data

Read Verification​

For read-side validation, the current user-facing control is --read-verify-ufs:

ModeBehavior
disabledTrust cached data and normal unified filesystem fallback rules
enabledCompare cache state against UFS metadata (mtime and length) before serving reads
note

For s3://... mounts, the CLI can auto-fill s3.bucket_name from the URI. For hdfs://... mounts, it can infer hdfs.namenode and hdfs.root from the URI. When Kerberos keys are present without hdfs.kerberos.ccache or KRB5CCNAME, the CLI prints a warning.

Unified Access​

After UFS is mounted, Curvine provides a unified file system view, and you can access the UFS file system just like accessing the Curvine file system; Clients, command line tools, fuse, etc. can all access the UFS file system through a unified path.

tip
  • Curvine does not cache UFS metadata, so there is no data consistency issue when accessing. Accessing UFS through Curvine is no different from accessing UFS directly. When Curvine cache data read fails, it automatically falls back to reading data from UFS.
  • If using the cv command, you can use --cache-only to temporarily disable unified access and view only files cached in Curvine. See the CLI page for details.

Disabling Unified Access​

If you don't want to use unified access, you can add or modify the following configuration:

[client]
enable_unified_fs = false