Blog | Curvine

Curvine Benchmark: 300 Million Files in Just 38 GB of Memory

April 6, 2026 · 5 min read

Translated from the original Chinese article published on April 6, 2026.

In distributed file systems, metadata memory efficiency, concurrent request handling, and small-file throughput are core indicators of overall system capability. Curvine recently completed a high-intensity metadata benchmark, and the results were clear: Curvine reached a new high-water mark for open-source metadata efficiency while delivering core capabilities comparable to commercial distributed storage products.

🔥 Key Takeaways

Efficient memory usage: With 800,000 directories and 300 million files, and one block written per file, Curvine used just 38 GB of memory. That is roughly on par with the metadata-memory capability described for the commercial edition of JuiceFS in reference [1].
Low latency under massive concurrency: With 100,000 clients looping operations, throughput held steady at 53,000 ops/s. Average command latency stayed below 2 ms, and P99 latency stayed below 9 ms.
High small-file throughput: Under heavy concurrent small-file writes, Curvine sustained 12 million small files per hour, with an average write time of 0.3 ms per file.

📝 Test Setup

Curvine cluster: one Master and one Worker
Benchmark machine: Alibaba Cloud ecs.i5.8xlarge, 32 cores, 256 GB RAM
Clients: 100,000 FUSE clients
Operations: repeated high-frequency commands such as mkdir, touch, file writes, and ls

📊 Core Benchmark Results

🧠 Memory Efficiency: A New Open-Source High-Water Mark

Managed scale: 800,000 directories + 300 million files
Per-file data written: 1 block
Total memory usage: 38 GB
Comparison point: comparable to the metadata-memory capability described for the commercial edition of JuiceFS

Memory efficiency benchmark

⏱️ High Concurrency, Low Latency at 100,000 Clients

Concurrent clients: 100,000 FUSE clients
Stable throughput: 53,000 ops/s
Average latency: up to 2 ms
P99 latency: up to 9 ms

QPS under concurrency

Latency under concurrency

Connection overhead was also low: 100,000 live connections consumed only 1.1 GB, or about 11.5 KB per connection.

Connection overhead

Once the benchmark stopped, Master memory dropped immediately from 39.1 GB back to 38 GB.

Master memory after benchmark stop

🚀 Small-File Throughput: Built for Scale

Files written per hour: 12 million small files
Average write time per file: 0.3 ms
Throughput remained saturated even under high concurrency

At 15:00, Curvine had written 287 million files:

Small-file count at 15:00

At 16:00, the total had reached 299 million files:

Small-file count at 16:00

🏗️ Metadata Architecture

Curvine's metadata subsystem stands out not just in large-scale memory efficiency and high-concurrency performance, but also in comparison with other open-source systems. Those results come from a deliberately designed metadata architecture.

Curvine metadata architecture overview

💡 Design Principles

A single Master should support very large namespaces and massive numbers of small files.
The system should provide high concurrency and low latency for frequent operations such as create, delete, and update.
External dependencies should be minimized to reduce operational complexity while keeping the system stable.

Based on those goals, Curvine combines an in-memory directory tree, standalone RocksDB, and a Raft-based consistency mechanism. This three-layer design balances performance, scale, and stability.

Layer	Core Responsibility	Why It Exists
In-memory directory tree	Stores directory structure metadata such as directory names and parent-child relationships; handles path resolution, directory listing, and other high-frequency namespace operations	Keeps the hottest namespace operations in memory so directory lookups and path matching stay in the microsecond range; stores only lightweight directory structure to maximize scale
Metadata RocksDB (`inode` engine)	Persists complete file and directory metadata, including file size, permissions, `mtime`, block locations, and full directory relationships	Uses column families to separate different metadata types, improving read/write efficiency and making frequent metadata updates easier to manage
Raft log RocksDB	Persists the log of all metadata mutations, including create, delete, and update operations, in order for node-to-node synchronization	Separates log storage from metadata storage so replication, compaction, cleanup, and recovery do not interfere with metadata reads and writes

🛡️ FsMode: Working with UFS for Safe Durability

Curvine also supports FsMode, which synchronizes metadata and file data to the underlying file system (UFS). This creates a dual safety model of local storage plus disk-backed fallback, preventing data loss without sacrificing runtime performance.

🚀 Future Directions

Curvine's metadata system will keep pushing forward in three areas:

10 billion files on a single node: continue deepening single-node capability until a standard 512 GB memory machine can manage metadata for 10 billion files.
Federation: improve cluster-scale metadata expansion with an HDFS Federation-like model that partitions by directory and can scale beyond 100 billion files. Federation is especially strong for centralized metadata operations such as mv and ls, but it requires directory planning up front.
Pluggable metadata management: abstract the metadata interface and support pluggable metadata backends for better flexibility and adaptability.

📚 References

👇 Follow Us

We regularly share hands-on work on distributed storage, metadata optimization, and high-concurrency benchmarking.

GitHub: https://github.com/CurvineIO/curvine

Curvine: Next-Generation Unified Data Access Layer, Combining POSIX and High-Speed Cache

March 19, 2026 · 7 min read

In our practice with distributed cache, we have identified core pain points users face: insufficient POSIX semantics support, high resource consumption, and complex operations. To better address these challenges, we have restructured Curvine's architecture and development roadmap, creating a next-generation unified data access layer that combines strong POSIX semantics with high-performance caching, delivering a qualitative improvement in remote data access experience.

Two Core Mount Modes for Diverse Business Needs

Curvine optimizes data read/write patterns into two concise mount point modes: CacheMode and FsMode, each targeting different business scenarios. This design balances cache acceleration with complete semantics support, allowing users to choose flexibly based on actual requirements.

CacheMode: Lightweight Read Cache Acceleration, Tightly Coupled with UFS

CacheMode centers on UFS (Underlying File System), primarily serving as a read cache accelerator and unified proxy for UFS. All read/write operations are UFS-centric:

Metadata caching effectively accelerates common operations like ls
Write operations go directly to UFS
Users maintain strong awareness of UFS without changing existing data operation habits

This is the preferred solution for lightweight UFS read performance improvement.

FsMode: Full Performance Acceleration with Strong POSIX Semantics

FsMode places Curvine itself at the core, with metadata independently managed by Curvine. Its path structure maps one-to-one with UFS, while UFS serves only as Curvine's cold storage layer. This mode provides:

Comprehensive read/write cache acceleration
Better POSIX semantics support for read/write operations
Ideal for large-scale file performance acceleration
Best choice for scenarios requiring both semantic completeness and high performance

FsMode Deep Dive: Layered Design Balancing Performance and Consistency

As Curvine's core mode, FsMode employs a layered filesystem mount/write design. Through clear semantic definitions and process planning, it achieves balance among performance, semantics, and data consistency. Let's examine its core design details.

Core Semantic Rules

Unified IO Entry Point: All data read/write operations go through Curvine. Applications should use only Curvine paths; direct UFS access is not recommended as it cannot guarantee data consistency.
Asynchronous Cold Storage Writes: When writing data, it first lands in Curvine (metadata + blocks). The Master side periodically submits Load/Dump tasks based on policies to asynchronously flush data to UFS (e.g., S3), making frontend write operations more efficient.
Intelligent Read Backfill: When reading data, Curvine is prioritized. If data has been evicted or exists only in UFS, a Load operation backfills UFS data to Curvine. The current read passes through directly from UFS, balancing read speed with data availability.
Flexible Replica Forms: Allows states where only UFS contains data. In such cases, UFS data serves as the file's sole replica, maximizing storage resource utilization.
On-Demand Metadata Synchronization: The mount operation synchronizes all metadata under the directory. Full synchronization is not performed actively afterward. If needed, use the mount resync command to manually update mount point metadata (synchronizing only file metadata that exists solely in UFS).
Lazy Cache Loading: If a file being read has no metadata in Curvine but exists in UFS, the first read will fail. On retry, UFS files are actively fetched. Alternatively, manually trigger the mount resync command in advance to synchronize metadata and avoid read failures.
Fault Tolerance Design: When Master fails, users can access data normally through UFS interfaces. When Worker fails, other replicas can be accessed in multi-replica scenarios; in single-replica scenarios, direct UFS reading ensures business continuity.

Consistency Design: Usable Now, Better in Future

The current FsMode consistency implementation:

When a UFS path is first mounted to Curvine, all metadata under that directory is mapped to Curvine as a whole
If files are written directly to UFS bypassing Curvine, Curvine cannot automatically perceive these changes
Manual re-synchronization via sync commands is required in such cases

Future Planning: Curvine will implement automatic perception of UFS metadata change events, enabling near real-time UFS metadata synchronization. This will technically guarantee data consistency completely, allowing users to operate without concerning themselves with synchronization—truly seamless usage.

FsMode Core Design Objectives

All FsMode designs revolve around clear objectives, ensuring every capability precisely addresses business pain points:

Unified Entry Point: All operations go through Curvine paths. Applications don't need to adapt to UFS, reducing development and operations costs.
POSIX Semantics: Supports complete POSIX filesystem semantics, including directory trees, random read/write, renaming, atomicity, strong consistency, etc., adapting to various traditional and new applications.
Tiered Storage: Curvine layer stores hot data (metadata + optional local blocks), UFS serves as persistent/cold replica, achieving hot/cold data separation for improved storage efficiency and access performance.
Background Flushing: Master periodically submits Load/Dump tasks based on business operations and preset policies to asynchronously flush Curvine data to UFS without affecting frontend business.
UFS-Only Replica: Supports scenarios where data exists only in UFS (e.g., S3). Data is backfilled on-demand or read-through as needed, balancing storage flexibility with data accessibility.

CacheMode vs FsMode: Core Differences at a Glance

To help you clearly distinguish between the two modes and precisely match business scenarios, we've organized the core comparison dimensions:

Comparison Item	CacheMode	FsMode
Semantics Support	Only supports UFS's native semantics	Supports complete POSIX semantics (directory trees, random read/write, renaming, atomicity, strong consistency, etc.)
Write Method	Data writes directly to UFS; applications tightly coupled with UFS	Data writes to Curvine; JM asynchronously flushes to UFS; applications face only Curvine
Metadata Management	Metadata cached but maintains strong consistency with UFS	Metadata maintained by Curvine Master, periodically synchronized to UFS; Curvine prevails in conflicts; UFS modifications via other interfaces are not actively perceived by Curvine
Read Logic	If cache exists, read from Curvine; if no cache, submit async task to load to Curvine, current read directly from UFS	Prioritize reading from Curvine; if no data in Curvine, Master marks as hot data and backfills to Curvine; current read directly from UFS
Data Expiration Handling	Deletes both metadata and data blocks in Curvine	Deletes only Curvine data blocks, retains metadata
Consistency Guarantee	Constrained by UFS (e.g., S3 eventual consistency)	Strong consistency on Curvine side; eventual consistency with UFS via async tasks

Extreme Resource Optimization, Lighter and More Friendly

Beyond refining functionality and performance, Curvine has made deep optimizations in resource consumption, upgrading comprehensively from underlying technology stack to implementation methods:

Curvine is built on Rust, naturally possessing high performance and low resource consumption characteristics. It also employs cutting-edge optimization techniques such as asynchronous operations and zero-copy, further reducing resource footprint.

Online practice data shows: A single Curvine Worker process occupies less than 1GB of memory. In large-scale cluster deployments, this effectively reduces server resource investment and operational costs, making it easily adaptable even in resource-constrained scenarios.

Product Philosophy: Not a Replacement, Just a Better Data Access Method

Curvine has had a clear product positioning from the start:

Not pursuing to become a general-purpose POSIX filesystem, nor attempting to replace any storage product.

We have always focused on one thing: making remote data access so fast that you can't feel the "remote" aspect, without changing users' original data operation habits. This is not only a technical challenge but also Curvine's core product philosophy—the best infrastructure is the kind that makes you unaware of its existence.

In the AI era, data volume is exploding. Remote data access performance and experience have become key factors affecting business efficiency. Curvine integrates the wisdom of distributed cache technology with POSIX completeness of distributed filesystems, while adhering to the core principle of "metadata transparency, unchanged file structure." Users can achieve a leap in remote data access performance without major modifications to existing business systems.

In the future, Curvine will continue to深耕 the unified data access layer field, continuously optimizing performance, stability, improving semantics support, and simplifying operations processes. We are committed to becoming a key part of data infrastructure in the AI era, providing efficient, stable, and lightweight data access support for digital upgrades of various businesses.

Finally, we hope more open-source enthusiasts from the storage and Rust fields will join us in building and sharing together!

Powered by OPPO Bigdata.

The Pain of Distributed Cache: Ideal vs. Reality

March 15, 2026 · 6 min read

After six months of open-source journey and surveying multiple users, we have gained a clear understanding of the pros and cons of the distributed cache model. In light of the limitations of distributed cache application scenarios, here is a simple reflection.

In today's booming era of big data and artificial intelligence, "storage-compute separation" has become the mainstream paradigm of cloud-native data architecture. Computing resources can elastically scale, while data is uniformly settled in low-cost object storage (such as S3, OSS). However, this architecture brings a fatal pain point: the high latency and low throughput of object storage seriously drag down computing performance. Thus, the distributed cache layer emerged as needed—it was hoped to become a high-speed bridge connecting "flexible computing" and "cheap storage."

Distributed file cache systems can "transparently accelerate" access to remote storage and provide a unified namespace. However, when enterprises eagerly introduce them into production environments, they often fall into the predicament of underwhelming performance, operational complexity, and semantic mismatch. This article will deeply analyze the gap between the technical ideal of distributed cache and its landing reality, revealing the structural limitations of distributed cache in general scenarios.

I. The Ideal of Distributed Cache: Unified, Transparent, High-Performance

The original design intention was highly attractive:

Unified Namespace: Mount heterogeneous storage such as HDFS, S3, GCS into a single directory tree, applications only need to access xx://;
Transparent Cache: After the first read of remote data, automatically cache to memory/SSD, subsequent responses in milliseconds;
Ecosystem Compatibility: Seamlessly integrate with mainstream computing engines such as Spark, Presto, Pytorch, without code modifications;
Tiered Storage: Support memory → SSD → HDD multi-level caching, balancing performance and cost.

In demonstration environments, distributed cache can indeed significantly improve the access performance of object storage, especially in scenarios where model training repeatedly reads input.

II. Reality's Pain: Three Structural Defects

However, the ideal is full, but the reality is bony. Distributed cache exposes three unavoidable defects in real business scenarios.

Pain 1: Incomplete POSIX Semantics, Limited Versatility

Distributed cache provides POSIX-like interfaces through FUSE, enabling traditional applications to read remote data like accessing local files. However, its support for POSIX semantics is highly incomplete:

❌ No random write support: Cannot modify bytes in the middle of a file, only allows creating new files or full overwrites;
❌ No truncate, hard links, or file locks;
⚠️ Strong consistency missing: Multiple clients may read expired cache, requiring manual metadata refresh.

This means distributed cache simply cannot run databases, log systems, or any programs requiring in-place updates. It is essentially designed for WORM (Write-Once-Read-Many), not a general-purpose file system. Many teams, after attempting to "seamlessly migrate" existing business to distributed cache, discover their applications crash due to write operation failures.

Truth: Distributed cache is not a "distributed POSIX file system," but a "data orchestration layer optimized for batch processing."

Pain 2: High Resource Consumption

Currently, distributed cache systems using Java or Go languages generally have high resource consumption problems.

For example, Java processes often occupy tens of GB of memory, which is somewhat wasteful for systems that use memory as cache.

Pain 3: Operational Complexity, ROI Hard to Deliver

The deployment of distributed cache involves multiple components such as Master, Worker, Journal, UFS connectors, etc., and resource tuning (memory allocation, cache policies, network configuration) is extremely complex. More fatally:

Cache hit rate depends on data access patterns: If jobs are one-time scans (such as ETL), caching has no value;
Resource competition: Memory/SSD occupied by Workers competes with Spark Executors for node resources;
Troubleshooting difficulties: Issues like cache inconsistency, block loss, and UFS synchronization failures require deep source code analysis to locate.

Many teams invest months in building and tuning cache clusters, only to find limited performance improvement but doubled operational burden, forcing them to abandon it.

III. Reflection: Can Distributed Cache Replace File Systems?

The dilemma of distributed cache reflects a deeper issue: attempting to use a general-purpose intermediate layer to solve all I/O problems is itself a form of technical dogmatism.

Distributed cache has significant effects in large-scale data I/O scenarios such as big data and AI training. However, as a company, purchasing or deploying a distributed cache cluster cannot achieve universal application across scenarios, nor can it fully utilize its capabilities.

Trend: "Specialized is better than generalized" — Rather than maintaining a heavyweight distributed cluster, it is better to build a more versatile tiered file system with a cache acceleration layer in the middle, providing more general support with file system semantics.

IV. Way Out: Rational Choice, Scenario-Driven

Distributed cache is not without merit. It still has value in the following scenarios:

✅ Hybrid cloud/multi-cloud architecture: Unified access to object storage from different cloud providers;
✅ High-reuse read-only datasets: Such as benchmark datasets repeatedly used in AI training;
✅ Dedicated platform teams available: Can bear its operational and tuning costs.

However, for most enterprises, a more pragmatic path is:

First evaluate whether I/O is truly a bottleneck: Confirm through profiling;
Prioritize optimizing data formats and query logic: Use Iceberg/Lance instead of raw files;
Avoid "using cache for the sake of using cache": Caching is a means, not an end;
Build general-purpose file system capabilities: Build cache that rivals file system capabilities, fully exploring versatility.

Conclusion

Distributed cache is a phased technical experiment that has promoted the development of data orchestration concepts. However, its "pain" also warns us: there is no silver bullet, only trade-offs. On the road to pursuing high performance, blindly introducing general-purpose middleware often backfires. True engineering wisdom lies in understanding the essence of business and choosing the most matching tool, even if it's not "cool" enough.

Distributed cache should not be a standard configuration of architecture, but rather a precise scalpel for specific scenarios. Only by building more general, lightweight, and efficient tiered file systems can we avoid falling into the "cache pain" and let data truly flow, rather than being trapped in layers of abstraction.

Powered by OPPO Bigdata.

Observability Construction for Curvine

November 27, 2025 · 7 min read

Curvine as a high-performance distributed cache system has strict requirements for performance, stability, and reliability. To ensure the system maintains optimal performance under various load conditions and to quickly locate and resolve potential issues, we have built a comprehensive monitoring solution based on Prometheus and Grafana. This monitoring system provides deep observability capabilities for Master nodes, Worker nodes, Fuse nodes, and S3 Gateway, enabling real-time monitoring of cache cluster scale, operational status, performance metrics, and resource usage through the collection of key metrics from each component.

Monitoring Architecture

This monitoring system adopts the following core components:

Prometheus: Responsible for metric collection, storage, and querying
Grafana: Provides data visualization and dashboard display

Observability Metrics

Master Node Metrics

As the cluster's metadata management center, Master nodes provide the following key metrics:

Capacity Metrics

Capacity metrics are fundamental for evaluating system storage resource usage, crucial for capacity planning, resource optimization, and preventive maintenance. By monitoring these metrics, storage bottlenecks can be identified in a timely manner, capacity requirements can be predicted, and sufficient space can be ensured to handle business growth.

Capacity Metrics

Metric Name	Description
inode_dir_num	Number of directories
inode_file_num	Number of files
num_blocks	Total number of blocks
blocks_size_avg	Average block size
capacity	Total storage capacity
available	Available storage space
fs_used	File system used space

Resource Metrics

Resource metrics reflect the system's usage of computing resources, significant for performance tuning, resource allocation, and fault prevention. Memory usage directly affects system performance and stability, especially for RocksDB as the core storage engine, whose memory usage needs precise monitoring to avoid memory overflow and performance degradation.

Resource Metrics

Metric Name	Description
used_memory_bytes	Used memory in bytes
rocksdb_used_memory_bytes	RocksDB memory usage

Cluster Status Metrics

Cluster status metrics provide a real-time view of the overall system health, crucial for ensuring high availability and data consistency. By monitoring Worker node status and replication task execution, node failures, data inconsistencies, and other issues can be quickly identified, ensuring reliable operation of the distributed cache system.

Cluster Status

Metric Name	Description
worker_num	Number of workers (classified by status)
replication_staging_number	Number of blocks waiting for replication
replication_inflight_number	Number of blocks currently being replicated
replication_failure_count	Total cumulative replication failures

Performance Metrics

Performance metrics are core indicators for measuring system responsiveness and processing efficiency, playing a key role in performance optimization and capacity planning. The total count and total time of RPC requests can be used to calculate average response time, directly reflecting system processing capability, while analysis of various operation durations helps identify performance bottlenecks and guide system optimization.

Performance Metrics

Metric Name	Description
rpc_request_total_count	Total RPC request count
rpc_request_total_time	Total RPC request time
operation_duration	Operation duration (classified by type, excluding heartbeat)

Journal System Metrics

The Journal system is a key component for ensuring data consistency and fault recovery, and its performance directly affects system write performance and data reliability. Monitoring Journal queue length and flush performance can help identify write bottlenecks in a timely manner, prevent data loss risks, and ensure system stability in high-concurrency write scenarios.

Journal Metrics

Metric Name	Description
journal_queue_len	Journal queue length
journal_flush_count	Journal flush count
journal_flush_time	Journal flush time

Client Metrics (Fuse/S3 Gateway)

Fuse and S3 Gateway metrics are collected through Client

Cache Metrics

Cache metrics directly reflect the core value of the cache system - improving access performance. Mount cache hit rate is a key indicator for measuring cache effectiveness, where high hit rates mean fewer backend accesses and faster response speeds. These metrics are crucial for evaluating cache strategy effectiveness and optimizing cache configuration.

Cache Metrics

Metric Name	Description
client_mount_cache_hits	Mount cache hit count
client_mount_cache_misses	Mount cache miss count

I/O Metrics

I/O metrics are core data for evaluating system read/write performance, providing guidance for performance tuning and capacity planning. By monitoring read/write bytes and duration, read/write throughput can be calculated, accurately assessing system I/O performance bottlenecks, optimizing storage strategies, and ensuring stable performance in high-concurrency access scenarios.

I/O Metrics

Metric Name	Description
client_write_bytes	Write bytes
client_write_time_us	Write time (microseconds)
client_read_bytes	Read bytes
client_read_time_us	Read time (microseconds)

Metadata Operation Metrics

Metadata operation performance directly affects file system response speed, crucial for improving user experience and overall system performance. Analysis of metadata operation duration helps identify metadata management bottlenecks, optimize directory structure, and improve file system operation efficiency.

Metadata Operation Metrics

Metric Name	Description
client_metadata_operation_duration	Metadata operation duration

Worker Node Metrics

As data storage nodes, Worker nodes provide comprehensive storage and performance metrics:

Capacity Metrics

Worker node capacity metrics are core to data storage management, playing a key role in load balancing, data migration, and capacity planning. By monitoring storage usage of each node, intelligent data distribution can be achieved, preventing single-point overload and ensuring optimal utilization of storage resources across the entire cache cluster.

Capacity Metrics

Metric Name	Description
capacity	Total storage capacity
available	Available storage space
fs_used	File system used space
num_blocks	Total number of blocks
num_blocks_to_delete	Number of blocks to be deleted

I/O Metrics

Worker node I/O metrics reflect the actual performance of data storage, crucial for evaluating storage hardware efficiency and optimizing data access patterns. Detailed read/write statistics help identify hot data, optimize data layout, and improve overall storage performance and response speed.

I/O Metrics

Metric Name	Description
write_bytes	Write bytes
write_time_us	Write time (microseconds)
write_count	Write count
write_blocks	Write blocks (classified by type)
read_bytes	Read bytes
read_time_us	Read time (microseconds)
read_count	Read count
read_blocks	Read blocks (classified by type)

Resource Metrics

Worker node resource usage directly affects the stability and performance of data storage services, significant for resource scheduling and performance optimization. Reasonable memory usage is the foundation for ensuring data cache efficiency and needs precise monitoring to avoid resource competition and performance degradation.

Resource Metrics

Metric Name	Description
used_memory_bytes	Used memory in bytes

Hardware Status Metrics

Hardware status metrics are an important monitoring dimension for ensuring data reliability and system availability, crucial for preventive maintenance and rapid fault response. By monitoring disk health status in real-time, hardware failure risks can be identified early, enabling timely data migration and hardware replacement, ensuring data security and continuous availability of the cache system.

Hardware Status Metrics

Metric Name	Description
failed_disks	Number of failed storage devices
total_disks	Total number of storage disks

Global Dashboards

Master

Worker

Client

Summary

The observability design of the Curvine distributed cache system covers the complete chain from metadata management to data storage, achieving the following through fine-grained metric collection:

End-to-End Monitoring: Complete monitoring from client requests to data storage, ensuring performance and status of each link are observable
Multi-Dimensional Observation: Covering multiple dimensions including performance, capacity, and status, providing a comprehensive view of system health
Real-Time Alerting: Real-time monitoring and alerting based on key metrics, enabling timely detection of anomalies and rapid response
Fault Diagnosis: Detailed metric data supports rapid fault location, reducing fault recovery time
Performance Optimization: Continuous monitoring and analysis provide data support for system performance tuning
Capacity Planning: Based on historical trends and real-time data, providing decision-making basis for capacity expansion and resource optimization

Through this comprehensive monitoring system, Curvine can maintain high availability and high performance in complex distributed environments, providing users with stable and reliable cache services.

Curvine Distributed Cache System User Guide

September 28, 2025 · 12 min read

🎯 System Overview

Curvine is a high-performance, cloud-native distributed caching system designed for modern data-intensive applications. It provides an intelligent caching layer between underlying storage (UFS) and compute engines, significantly improving data access performance.

🏆 Performance Advantages

Compared to traditional storage access methods, Curvine can provide:

Metric	Cloud Storage	Curvine Cache	Performance Improvement
Read Latency	100-500ms	1-10ms	10-50x
Throughput	100-500 MB/s	1-10 GB/s	10-20x
IOPS	1K-10K	100K-1M	100x
Concurrent Connections	100-1K	10K-100K	100x

Core Components

Master Cluster: Metadata management, cache scheduling, consistency guarantees
Worker Nodes: Data caching, I/O processing, task execution
Client SDK: Multi-language clients, supporting Rust, Fuse, Java, Python
Job Manager: Distributed task scheduling and management
Metrics System: Real-time monitoring and performance analysis

📂 Path Mount Management

Mounting is the first step in using Curvine cache, which establishes the mapping relationship between underlying storage (UFS) and cache paths.

Mounting Modes Explained

Curvine supports two flexible mounting modes:

🎯 CST Mode (Consistent Path Mode)

# Consistent path, easy to manage and maintain
bin/cv mount s3://bucket/data /bucket/data --mnt-type cst

Ideal scenarios:

Data lake scenarios with clear path structures
Production environments requiring intuitive path mapping
Data platforms with multi-team collaboration

🔀 Orch Mode (Orchestration Mode)

# Flexible path mapping, supporting complex path transformations
bin/cv mount s3://complex-bucket/deep/nested/path /simple/data --mnt-type orch

Ideal scenarios:

Complex storage hierarchies
Scenarios requiring path abstraction
Multi-cloud storage unified access

Complete Mounting Example

# Mount S3 storage to Curvine (production-grade configuration)
bin/cv mount \
s3://bucket/warehouse/tpch_500g.db/orders \
/bucket/warehouse/tpch_500g.db/orders \
--ttl-ms 24h \
--ttl-action delete \
--replicas 3 \
--block-size 128MB \
--consistency-strategy always \
--storage-type ssd \
-c s3.endpoint_url=https://s3.ap-southeast-1.amazonaws.com \
-c s3.credentials.access=access_key \
-c s3.credentials.secret=secret_key \
-c s3.region_name=ap-southeast-1 

Mounting Parameters Explained

Parameter	Type	Default	Description	Example
`--ttl-ms`	duration	`0`	Cache data expiration time	`24h`, `7d`, `30d`
`--ttl-action`	enum	`none`	Expiration policy: `delete`/`none`	`delete`
`--replicas`	int	`1`	Number of data replicas (1-5)	`3`
`--block-size`	size	`128MB`	Cache block size	`64MB`, `128MB`, `256MB`
`--consistency-strategy`	enum	`always`	Consistency strategy	`none`/`always`/`period`
`--storage-type`	enum	`disk`	Storage medium type	`mem`/`ssd`/`disk`

Mount Point Management

# View all mount points
bin/cv mount

# Unmount path
bin/cv unmount /bucket/warehouse/tpch_500g.db/orders

💾 Intelligent Caching Strategies

Curvine provides multiple intelligent caching strategies, from passive response to active prediction, comprehensively optimizing data access performance.

Active Data Preloading

Active loading allows you to warm up the cache before business peaks to ensure optimal performance:

# Basic loading
bin/cv load s3:/bucket/warehouse/critical-dataset

# Synchronous loading with progress monitoring
bin/cv load s3:/bucket/warehouse/critical-dataset -w

Automatic Caching Strategy

Curvine's automatic caching system has significant advantages over traditional solutions:

✨ Curvine Intelligent Cache Architecture

curvine

Core Advantage Comparison

Feature	Open Source Competitors	Curvine	Advantage Description
Loading Granularity	Block-level	File/Directory-level	Avoid fragmentation, ensure integrity
Duplicate Processing	Exists duplicate loading	Intelligent deduplication	Save bandwidth and storage resources
Task Scheduling	Simple queue	Distributed Job Manager	Efficient concurrency, load balancing
Consistency Guarantee	Passive checking	Active awareness	Real-time data synchronization

🔄 Data Consistency Guarantees

Data consistency is a core challenge for caching systems, and Curvine provides multi-level consistency guarantee mechanisms.

Consistency Strategy Details

1. 🚫 None Mode - Highest Performance

bin/cv mount s3://bucket/path /bucket/path --consistency-strategy=none

Ideal scenarios: Static data, archived data, read-only datasets
Performance: ⭐⭐⭐⭐⭐ (fastest)
Consistency: ⭐⭐ (TTL-dependent)

2. ✅ Always Mode - Strong Consistency

bin/cv mount s3://bucket/path /bucket/path --consistency-strategy=always

Ideal scenarios: Frequently updated business data, critical business systems
Performance: ⭐⭐⭐ (has overhead)
Consistency: ⭐⭐⭐⭐⭐ (strong consistency)

3. 🕰️ Period Mode - Balanced Solution

bin/cv mount s3://bucket/path /bucket/path \
  --consistency-strategy=period \
  --check-interval=5m

Ideal scenarios: Data with predictable update frequency
Performance: ⭐⭐⭐⭐ (good)
Consistency: ⭐⭐⭐⭐ (periodically guaranteed)

Cache Performance Monitoring

Monitoring cache hit ratio is an important way to evaluate the effectiveness of consistency strategies:

# Get cache hit ratio
curl -s http://master:9001/metrics | grep -E "(cache_hits|cache_misses)"

client_mount_cache_hits{id="3108497238"} 823307
client_mount_cache_misses{id="3108497238"} 4380

🤖 AI/ML Scenario Applications

AI and machine learning workloads have extremely high requirements for storage performance, and Curvine provides specially optimized functions for this.

Deep Learning Training Optimization

# Optimized data loading for GPU clusters
bin/cv mount s3://datasets/imagenet /datasets/imagenet \
  --storage-type=mem \
  --block-size=1GB \
  --replicas=2 

Model Serving Scenarios

# Model file caching (low-latency access)
bin/cv mount s3://model/bert-large /models/bert-large \
  --storage-type=mem \
  --ttl-ms=none \
  --consistency-strategy=always 

# Inference data caching
bin/cv mount s3://inference/input /inference/input \
  --storage-type=ssd \
  --ttl-ms=1h \
  --consistency-strategy=none 

🔗 POSIX Semantics and FUSE Access

Curvine perfectly supports POSIX semantics through the FUSE (Filesystem in Userspace) interface, allowing the Curvine cluster to be mounted as a local file system, providing a transparent file access experience for AI/ML applications.

FUSE Usage in AI/ML Training

1. Deep Learning Training Data Access

# PyTorch training script
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os

class CurvineImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        """
        Directly access data in Curvine through FUSE mount point
        root_dir: FUSE mount point path, such as /curvine-fuse/datasets/imagenet
        """
        self.root_dir = root_dir
        self.transform = transform
        self.image_paths = []
        
        # Directly traverse the FUSE-mounted directory
        for class_dir in os.listdir(root_dir):
            class_path = os.path.join(root_dir, class_dir)
            if os.path.isdir(class_path):
                for img_file in os.listdir(class_path):
                    if img_file.lower().endswith(('.png', '.jpg', '.jpeg')):
                        self.image_paths.append(os.path.join(class_path, img_file))
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        # Access data through standard file operations, enjoying Curvine cache acceleration
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('RGB')
        
        if self.transform:
            image = self.transform(image)
            
        # Extract label from path
        label = os.path.basename(os.path.dirname(img_path))
        return image, label

# Usage example
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                        std=[0.229, 0.224, 0.225])
])

# Directly use the path of the FUSE mount point
dataset = CurvineImageDataset(
    root_dir='/curvine-fuse/datasets/imagenet/train',
    transform=transform
)

dataloader = DataLoader(
    dataset, 
    batch_size=64, 
    shuffle=True, 
    num_workers=8,
    pin_memory=True
)

# Training loop
for epoch in range(num_epochs):
    for batch_idx, (data, targets) in enumerate(dataloader):
        # Data is automatically loaded from Curvine cache through FUSE
        # Enjoy near-memory access speed
        outputs = model(data.cuda())
        loss = criterion(outputs, targets.cuda())
        # ... training logic

2. TensorFlow/Keras Data Pipeline

import tensorflow as tf
import os

def create_curvine_dataset(data_dir, batch_size=32):
    """
    Create TensorFlow data pipeline through FUSE mount point
    data_dir: FUSE-mounted data directory
    """
    
    # Directly access FUSE-mounted data using standard file APIs
    def load_and_preprocess_image(path):
        # TensorFlow transparently accesses Curvine cache through FUSE
        image = tf.io.read_file(path)
        image = tf.image.decode_jpeg(image, channels=3)
        image = tf.image.resize(image, [224, 224])
        image = tf.cast(image, tf.float32) / 255.0
        return image
    
    # Scan files in the FUSE-mounted directory
    image_paths = []
    labels = []
    
    for class_name in os.listdir(data_dir):
        class_dir = os.path.join(data_dir, class_name)
        if os.path.isdir(class_dir):
            for img_file in os.listdir(class_dir):
                if img_file.lower().endswith(('.png', '.jpg', '.jpeg')):
                    image_paths.append(os.path.join(class_dir, img_file))
                    labels.append(class_name)
    
    # Create dataset
    path_ds = tf.data.Dataset.from_tensor_slices(image_paths)
    label_ds = tf.data.Dataset.from_tensor_slices(labels)
    
    # Apply preprocessing
    image_ds = path_ds.map(
        load_and_preprocess_image, 
        num_parallel_calls=tf.data.AUTOTUNE
    )
    
    # Combine data and labels
    dataset = tf.data.Dataset.zip((image_ds, label_ds))
    
    return dataset.batch(batch_size).prefetch(tf.data.AUTOTUNE)

# Usage example
train_dataset = create_curvine_dataset('/curvine-fuse/datasets/imagenet/train')
val_dataset = create_curvine_dataset('/curvine-fuse/datasets/imagenet/val')

# Model training
model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=50,
    callbacks=[
        tf.keras.callbacks.ModelCheckpoint('/curvine-fuse/models/checkpoints/'),
        tf.keras.callbacks.TensorBoard(log_dir='/curvine-fuse/logs/')
    ]
)

🗄️ Big Data Ecosystem Integration

Curvine seamlessly integrates with mainstream big data frameworks, providing transparent cache acceleration capabilities.

Hadoop Ecosystem Integration

Basic Configuration

Add in hdfs-site.xml or core-site.xml:

<!-- Curvine FileSystem implementation -->
<property>
    <name>fs.cv.impl</name>
    <value>io.curvine.CurvineFileSystem</value>
</property>

<!-- Single cluster configuration -->
<property>
    <name>fs.cv.master_addrs</name>
    <value>master1:8995,master2:8995,master3:8995</value>
</property>

Multi-cluster Support

<!-- Cluster 1: Production environment -->
<property>
    <name>fs.cv.production.master_addrs</name>
    <value>prod-master1:8995,prod-master2:8995,prod-master3:8995</value>
</property>

<!-- Cluster 2: Development environment -->
<property>
    <name>fs.cv.development.master_addrs</name>
    <value>dev-master1:8995,dev-master2:8995</value>
</property>

<!-- Cluster 3: Machine learning dedicated cluster -->
<property>
    <name>fs.cv.ml-cluster.master_addrs</name>
    <value>ml-master1:8995,ml-master2:8995,ml-master3:8995</value>
</property>

🔄 UFS Transparent Proxy

To better support existing Java applications to seamlessly access Curvine cache, we provide a UFS transparent proxy solution. The core advantage of this solution is zero code modification, allowing existing applications to immediately enjoy the cache acceleration effects of Curvine.

✨ Core Features of Transparent Proxy

🚫 Zero code modification: Preserves all original interfaces unchanged, no business code modifications required
🔍 Intelligent path recognition: Only determines whether the path has been mounted to Curvine when opening a file
⚡ Automatic cache acceleration: Automatically enables cache acceleration for mounted paths, native S3 access for unmounted paths
🔄 Smooth switching: Supports dynamically switching whether to use cache at runtime without restarting the application

🛠️ Configuration Method

Simply replace the S3FileSystem implementation class in Hadoop configuration:

<!-- Traditional S3 access configuration -->
<!--
<property>
    <name>fs.s3a.impl</name>
    <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
-->

<!-- Replace with Curvine transparent proxy -->
<property>
    <name>fs.s3a.impl</name>
    <value>io.curvine.S3AProxyFileSystem</value>
</property>

<property>
    <name>fs.cv.impl</name>
    <value>io.curvine.CurvineFileSystem</value>
</property>

<!-- Curvine cluster configuration -->
<property>
    <name>fs.curvine.master_addrs</name>
    <value>master1:8995,master2:8995,master3:8995</value>
</property>

🔧 Working Principle

Working Principle

🚀 Usage Example

No need to modify any business code, original code directly enjoys acceleration:

// Business code remains completely unchanged!
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create("s3a://my-bucket/"), conf);

// If this path is mounted to Curvine, automatically enjoy cache acceleration
FSDataInputStream input = fs.open(new Path("s3a://my-bucket/warehouse/data.parquet"));

// If this path is not mounted, use native S3 access
FSDataInputStream input2 = fs.open(new Path("s3a://my-bucket/archive/old-data.parquet"));

Spark/MapReduce code example:

// Spark code does not need any modification
Dataset<Row> df = spark.read()
    .option("header", "true")
    // If /warehouse/ path is mounted, automatically use cache acceleration
    .csv("s3a://data-lake/warehouse/customer_data/");
    
df.groupBy("region")
  .agg(sum("revenue").alias("total_revenue"))
  .orderBy(desc("total_revenue"))
  .show(20);

Python PySpark example:

# Python code also does not need modification
from pyspark.sql import SparkSession
from pyspark.sql.functions import sum, desc

spark = SparkSession.builder.appName("TransparentCache").getOrCreate()

# Automatically determine whether to use cache
df = spark.read \
    .option("header", "true") \
    .csv("s3a://data-lake/analytics/events/")

result = df.groupBy("event_type") \
    .agg(sum("count").alias("total_events")) \
    .orderBy(desc("total_events"))
    
result.show()

Apache Spark Optimization Configuration

# Spark application startup configuration
spark-submit \
  --class com.example.SparkApp \
  --master yarn \
  --deploy-mode cluster \
  --conf spark.hadoop.fs.cv.impl=io.curvine.CurvineFileSystem \
  --conf spark.hadoop.fs.cv.master_addrs=master1:8995,master2:8995,master3:8995 \
  --conf spark.sql.adaptive.enabled=true \
  --jars curvine-hadoop-client.jar \
  app.jar

Spark Code Example

// Scala example
val spark = SparkSession.builder()
  .appName("Curvine Demo")
  .config("spark.hadoop.fs.cv.impl", "io.curvine.CurvineFileSystem")
  .getOrCreate()

// Directly use cv:// protocol to access cached data
val df = spark.read
  .option("multiline", "true")
  .json("cv://production/warehouse/events/2024/01/01/")

df.groupBy("event_type")
  .count()
  .show()

// Multi-cluster access
val prodData = spark.read.parquet("cv://production/warehouse/sales/")
val mlData = spark.read.parquet("cv://ml-cluster/features/user_profiles/")

# Python example
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("Curvine Python Demo") \
    .config("spark.hadoop.fs.cv.impl", "io.curvine.CurvineFileSystem") \
    .config("spark.hadoop.fs.cv.master_addrs", "master1:8995,master2:8995") \
    .getOrCreate()

# Read data from cache
df = spark.read.option("header", "true") \
    .csv("cv://warehouse/customer_data/")

# Complex queries automatically enjoy cache acceleration
result = df.groupBy("region") \
    .agg({"revenue": "sum", "orders": "count"}) \
    .orderBy("sum(revenue)", ascending=False)

result.show(20)

Trino/Presto Plugin Integration

Curvine provides an intelligent path replacement plugin, which can achieve non-invasive cache acceleration without requiring business code modifications, achieving completely transparent cache acceleration:

Plugin Workflow

Spark plugin usage example:

spark-submit \
--class main.scala.Tpch \
--name tpch_demo \
--conf spark.hadoop.fs.cv.impl=io.curvine.CurvineFileSystem \
--conf spark.hadoop.fs.cv.default.master_addrs=master1:8995,master2:8995 \
--conf spark.sql.extensions=io.curvine.spark.CurvineSparkExtension \

Flink Real-time Computing Integration

// Flink Table API integration example
TableEnvironment tableEnv = TableEnvironment.create(settings);

// Configure Curvine FileSystem
Configuration config = new Configuration();
config.setString("fs.cv.impl", "io.curvine.CurvineFileSystem");
config.setString("fs.cv.master_addrs", "master1:8995,master2:8995");

// Create Curvine table
tableEnv.executeSql(
    "CREATE TABLE user_events (" +
    "  user_id BIGINT," +
    "  event_type STRING," +
    "  timestamp_ms BIGINT," +
    "  properties MAP<STRING, STRING>" +
    ") WITH (" +
    "  'connector' = 'filesystem'," +
    "  'path' = 'cv://streaming/events/'," +
    "  'format' = 'json'" +
    ")"
);

// Real-time query enjoys cache acceleration
Table result = tableEnv.sqlQuery(
    "SELECT user_id, COUNT(*) as event_count " +
    "FROM user_events " +
    "WHERE timestamp_ms > UNIX_TIMESTAMP() * 1000 - 3600000 " +
    "GROUP BY user_id"
);

💡 Best Practices

🎯 Mounting Strategy Best Practices

Tiered Mounting by Business Scenarios

# Hot data: high-frequency access, using memory cache
bin/cv mount s3://bucket/hot /bucket/hot \
  --storage-type=mem \
  --replicas=3 \
  --ttl-ms=1d \
  --ttl-action=delete

# Warm data: regular access, using SSD cache
bin/cv mount s3://bucket/warm /bucket/warm \
  --storage-type=ssd \
  --replicas=2 \
  --ttl-ms=7d \
  --ttl-action=delete


# Cold data: low-frequency access, using disk cache
bin/cv mount s3://bucket/cold /bucket/cold \
  --storage-type=disk \
  --replicas=1 \
  --ttl-ms=30d \
  --ttl-action=delete

Optimization by Data Type

# Small file intensive (e.g., logs, configurations)
bin/cv mount s3://bucket/logs /bucket/logs \
  --block-size=4MB \
  --consistency-strategy=none 

# Large file type (e.g., videos, models)
bin/cv mount s3://bucket/models /bucket/models \
  --block-size=1GB \
  --consistency-strategy=always 

# Analytical data (e.g., Parquet)
bin/cv mount s3://bucket/analytics /bucket/analytics \
  --block-size=128MB \
  --consistency-strategy=none \

🎯 Summary

As a new generation distributed caching system, Curvine provides excellent performance improvements for modern data-intensive applications through intelligent caching strategies, strong consistency guarantees, and seamless ecosystem integration.

🏆 Core Values

🚀 Performance Improvement: 10-100x access acceleration, significantly reducing data access latency
💰 Cost Optimization: Reduce cloud storage access costs, improve computing resource utilization
🛡️ Data Security: Multiple consistency guarantees to ensure data accuracy and integrity
🌐 Ecosystem Friendly: Seamless integration with mainstream big data and AI frameworks

Curvine - Make data access lightning fast ⚡

Building a Curvine Cluster from Scratch & FIO Testing

August 8, 2025 · 2 min read

David

Founder of Curvine

How to quickly get started and try out Curvine's performance? This article will introduce how to build a local small cluster from scratch, allowing everyone to get hands-on experience quickly.

GitHub: https://github.com/CurvineIO/curvine

1. Download the Code:

git clone https://github.com/CurvineIO/curvine.git

2. Environment Requirements:

GCC: version 10 or later 
Rust: version 1.86 or later 
Protobuf: version 3.x
Maven: version 3.8 or later
LLVM: version 12 or later
FUSE: libfuse2 or libfuse3 development packages
JDK: version 1.8 or later 
npm: version 9 or later
Python: version 3.7 or later 

3. Compile & Run

make all

To facilitate compilation, our build script will check dependencies in advance. For macOS users, we will temporarily skip FUSE compilation (currently not adapted for macOS). Interested users can consider using the macfuse project for adaptation.

make-checkenv

4. After Compilation, Start Local Cluster

cd build/dist
./bin/restart-all.sh

After successful startup, execute the report command to check if it's working:

bin/cv report

       active_master: localhost:8995
       journal_nodes: 1,localhost:8996
            capacity: 233.5GB
           available: 105.0GB (44.99%)
             fs_used: 0.0B (0.00%)
         non_fs_used: 128.4GB
     live_worker_num: 1
     lost_worker_num: 0
           inode_num: 0
           block_num: 0
    live_worker_list: 192.168.xxx.xxx:8997,105.0GB/233.5GB (44.99%)
    lost_worker_list:

5. View Local Master and Worker WebUI

http://localhost:9000/
http://localhost:9001/

webui

6. FIO Testing

Test Environment: Alibaba Cloud ecs.r8a.8xlarge instance with one master/worker/client each

32 cores (vCPU)
256 GiB memory
System disk and data disk both: ESSD cloud disk 500 GiB (7800 IOPS)
Maximum bandwidth: 25Gb

Prepare data (on worker machine):

bin/curvine-bench.sh fuse.write

FIO Sequential Read Test, 8 Concurrent Jobs

fio -iodepth=1 -rw=read -ioengine=libaio -bs=256k
 -group_reporting -size=200gb 
 -filename=/curvine-fuse/fs-bench/0  
 -name=read_test --readonly -direct=1 --runtime=60 
 -numjobs=8

FIO Random Read Test, 8 Concurrent Jobs

fio -iodepth=1 -rw=randread -ioengine=libaio -bs=256k
 -group_reporting -size=200gb 
 -filename=/curvine-fuse/fs-bench/0  
 -name=read_test --readonly -direct=1 --runtime=60
 -numjobs=8

Finally, here's a video demonstration of the FIO testing results:

Curvine: High-Performance Distributed Cache——DataFun Intelligent Conference

July 25, 2025 · 2 min read

David

Founder of Curvine

what-is-curvine

July 15, 2025 · 3 min read

David

Founder of Curvine

Curvine Caching now comming!

May 29, 2025 · One min read

Barry

Senior Engineer

David

Founder of Curvine

starrocks-compute-storage-separation-curvine

February 11, 2025 · 4 min read

David

Founder of Curvine

🔥 Key Takeaways​

📝 Test Setup​

📊 Core Benchmark Results​

🧠 Memory Efficiency: A New Open-Source High-Water Mark​

⏱️ High Concurrency, Low Latency at 100,000 Clients​

🚀 Small-File Throughput: Built for Scale​

🏗️ Metadata Architecture​

💡 Design Principles​

🛡️ FsMode: Working with UFS for Safe Durability​

🚀 Future Directions​

📚 References​

👇 Follow Us​

Two Core Mount Modes for Diverse Business Needs​

CacheMode: Lightweight Read Cache Acceleration, Tightly Coupled with UFS​

FsMode: Full Performance Acceleration with Strong POSIX Semantics​

FsMode Deep Dive: Layered Design Balancing Performance and Consistency​

Core Semantic Rules​

Consistency Design: Usable Now, Better in Future​

FsMode Core Design Objectives​

CacheMode vs FsMode: Core Differences at a Glance​

Extreme Resource Optimization, Lighter and More Friendly​

Product Philosophy: Not a Replacement, Just a Better Data Access Method​

I. The Ideal of Distributed Cache: Unified, Transparent, High-Performance​

II. Reality's Pain: Three Structural Defects​

Pain 1: Incomplete POSIX Semantics, Limited Versatility​

Pain 2: High Resource Consumption​

Pain 3: Operational Complexity, ROI Hard to Deliver​

III. Reflection: Can Distributed Cache Replace File Systems?​

IV. Way Out: Rational Choice, Scenario-Driven​

Conclusion​

Monitoring Architecture​

Observability Metrics​

Master Node Metrics​

Capacity Metrics​

Resource Metrics​

Cluster Status Metrics​

Performance Metrics​

Journal System Metrics​

Client Metrics (Fuse/S3 Gateway)​

Cache Metrics​

I/O Metrics​

Metadata Operation Metrics​

Worker Node Metrics​

Capacity Metrics​

I/O Metrics​

Resource Metrics​

Hardware Status Metrics​

Global Dashboards​

Master​

Worker​

Client​

Summary​

📚 Table of Contents​

🎯 System Overview​

🏆 Performance Advantages​

Core Components​

📂 Path Mount Management​

Mounting Modes Explained​

🎯 CST Mode (Consistent Path Mode)​

🔀 Orch Mode (Orchestration Mode)​

Complete Mounting Example​

Mounting Parameters Explained​

Mount Point Management​

💾 Intelligent Caching Strategies​

Active Data Preloading​

Automatic Caching Strategy​

✨ Curvine Intelligent Cache Architecture​

Core Advantage Comparison​

🔄 Data Consistency Guarantees​

Consistency Strategy Details​

1. 🚫 None Mode - Highest Performance​

2. ✅ Always Mode - Strong Consistency​

3. 🕰️ Period Mode - Balanced Solution​

Cache Performance Monitoring​

🤖 AI/ML Scenario Applications​

Deep Learning Training Optimization​

Model Serving Scenarios​

🔗 POSIX Semantics and FUSE Access​

FUSE Usage in AI/ML Training​

1. Deep Learning Training Data Access​

🔥 Key Takeaways

📝 Test Setup

📊 Core Benchmark Results

🧠 Memory Efficiency: A New Open-Source High-Water Mark

⏱️ High Concurrency, Low Latency at 100,000 Clients

🚀 Small-File Throughput: Built for Scale

🏗️ Metadata Architecture

💡 Design Principles

🛡️ FsMode: Working with UFS for Safe Durability

🚀 Future Directions

📚 References

👇 Follow Us

Two Core Mount Modes for Diverse Business Needs

CacheMode: Lightweight Read Cache Acceleration, Tightly Coupled with UFS

FsMode: Full Performance Acceleration with Strong POSIX Semantics

FsMode Deep Dive: Layered Design Balancing Performance and Consistency

Core Semantic Rules

Consistency Design: Usable Now, Better in Future

FsMode Core Design Objectives

CacheMode vs FsMode: Core Differences at a Glance

Extreme Resource Optimization, Lighter and More Friendly

Product Philosophy: Not a Replacement, Just a Better Data Access Method

I. The Ideal of Distributed Cache: Unified, Transparent, High-Performance

II. Reality's Pain: Three Structural Defects

Pain 1: Incomplete POSIX Semantics, Limited Versatility

Pain 2: High Resource Consumption

Pain 3: Operational Complexity, ROI Hard to Deliver

III. Reflection: Can Distributed Cache Replace File Systems?

IV. Way Out: Rational Choice, Scenario-Driven

Conclusion

Monitoring Architecture

Observability Metrics

Master Node Metrics

Capacity Metrics

Resource Metrics

Cluster Status Metrics

Performance Metrics

Journal System Metrics

Client Metrics (Fuse/S3 Gateway)

Cache Metrics

I/O Metrics

Metadata Operation Metrics

Worker Node Metrics

Capacity Metrics

I/O Metrics

Resource Metrics

Hardware Status Metrics

Global Dashboards

Master

Worker

Client

Summary

📚 Table of Contents

🎯 System Overview

🏆 Performance Advantages

Core Components

📂 Path Mount Management

Mounting Modes Explained

🎯 CST Mode (Consistent Path Mode)

🔀 Orch Mode (Orchestration Mode)

Complete Mounting Example

Mounting Parameters Explained

Mount Point Management

💾 Intelligent Caching Strategies

Active Data Preloading

Automatic Caching Strategy

✨ Curvine Intelligent Cache Architecture

Core Advantage Comparison

🔄 Data Consistency Guarantees

Consistency Strategy Details

1. 🚫 None Mode - Highest Performance

2. ✅ Always Mode - Strong Consistency

3. 🕰️ Period Mode - Balanced Solution

Cache Performance Monitoring

🤖 AI/ML Scenario Applications

Deep Learning Training Optimization

Model Serving Scenarios

🔗 POSIX Semantics and FUSE Access

FUSE Usage in AI/ML Training

1. Deep Learning Training Data Access