Skip to main content

Data Orchestration

Curvine provides UFS (Unified File System) view to manage all supported distributed storage systems, including s3/hdfs etc.

Mounting​

Curvine supports connecting multiple UFS sources by mounting them to different Curvine paths. Curvine does not provide default UFS configuration, which means if you want to load data from UFS, you must first mount the UFS source.

mount-arch

Curvine persists the mount table in metadata, so there is no need to remount when Curvine restarts. However, some rules must be followed.

  • Mounting to the root path is not allowed.
  • Mounting other UFS under a mounted path is not allowed.
  • The same mount path cannot be mounted to different UFS.

Mount command:

bin/cv mount ufs_path curvine_path [configs]
  • ufs_path: UFS path, e.g., s3://bucket/path.
  • curvine_path: Curvine path, e.g., /ufs/path.
  • configs: Optional parameters, e.g., access_key_id, secret_access_key, region, endpoint, etc.

Example:

bin/cv mount s3://ai/xuen-test /s3 \
-c s3.endpoint_url=http://hostname.com \
-c s3.region_name=cn \
-c s3.credentials.access=access_key \
-c s3.credentials.secret=secret_key \
-c s3.path_style=true
tip

You can use command line, API to access ufs directories and files after UFS is mounted, but ufs data will not be automatically synchronized to curvine unless you load specific paths.

Mounting Parameters​

ParameterTypeDefaultDescriptionExample
--ttl-msduration0Cache data expiration time24h, 7d, 30d
--ttl-actionenumnoneExpiration policy: delete/nonedelete
--replicasint1Number of data replicas (1-5)3
--block-sizesize128MBCache block size64MB, 128MB, 256MB
--consistency-strategyenumalwaysConsistency strategynone/always/period
--storage-typeenumdiskStorage medium typemem/ssd/disk

Mount Modes​

Write Cache​

WriteType controls the write behavior between Curvine cache and underlying storage (UFS)

ModeBehavior (Sync/Async)ConsistencyUse Cases
Cache modeWrite only to Curvine cache, not sync to UFSCache-only consistency, no data in UFSTemporary data, scratch data, disposable intermediate results
Through modeWrite directly to UFS, bypass cacheStrong consistency, data directly persisted to UFSWrite-once-read-many scenarios, cache-unbeneficial data
AsyncThrough mode (default)Write to cache first, async sync to UFS, return immediatelyEventual consistency, cache readable immediately, UFS updated asyncMachine learning training, scenarios balancing performance and persistence
CacheThrough modeSync write to cache and UFS, wait for UFS completionStrong consistency, ensure data exists in both cache and UFSShared data, scenarios requiring strong persistence guarantees

Read Cache​

ConsistencyStrategy determines whether consistency validation with underlying storage (UFS) is needed when reading logs

ModeBehaviorConsistencyUse Cases
NoneTrust cache, no consistency validation during readsNo consistency guarantee, may read stale dataHigh-performance read scenarios, relatively static base data, temporary data/intermediate results
AlwaysValidate cache matches UFS on every readStrong consistency guarantee, ensure latest dataIntermittent data updates, multi-client shared data, strong consistency requirement scenarios
note

TTL controls read cache behavior in Curvine by determining cache validity to trigger automatic cache refresh or cleanup operations.

Unified Access​

After UFS is mounted, Curvine provides a unified file system view, and you can access the UFS file system just like accessing the Curvine file system; Clients, command line tools, fuse, etc. can all access the UFS file system through a unified path.

tip
  • Curvine does not cache UFS metadata, so there is no data consistency issue when accessing. Accessing UFS through Curvine is no different from accessing UFS directly. When Curvine cache data read fails, it automatically falls back to reading data from UFS.
  • If using the cv command, you can use the cache-only parameter to temporarily disable unified access to view files only cached in curvine. See fsSubcommandsubcommand for details.

Disabling Unified Access​

If you don't want to use unified access, you can add or modify the following configuration:

[client]
enable_unified_fs = false