Curvine Overview
Curvine is a high-performance, high-concurrency distributed caching system released under the Apache 2.0 open-source license. It provides unified path access to various storage systems with caching acceleration, featuring POSIX compatibility for seamless integration as massive local storage, while supporting concurrent mount and read/write operations across cross-platform and cross-region hosts.
Curvine adopts the classic master-worker architecture to implement distributed file system design. The Master manages file metadata and ensures high availability and data consistency through the Raft protocol, while Workers handle data management.
Curvine provides rich APIs suitable for various forms of data management, analysis, archiving, and backup. It can seamlessly integrate with big data, machine learning, and artificial intelligence platforms without code modifications, providing massive, elastic, and cost-effective caching acceleration.
Core Features​
- High Performance: Millisecond-level latency with exceptional performance for writes, sequential reads, and random reads. Single node can achieve up to 15GiB/s read throughput
- High Concurrency: Single node supports tens of thousands of concurrent file read/write operations
- Low Resource Consumption: In 1000 concurrent read/write tests, both server and client require only tens of MB of memory; CPU usage is reduced by 50% compared to other systems
- Multi-Backend Storage Support: Supports S3, HDFS, OSS, MinIO, and other backend storage systems with unified access interface
- POSIX Compatibility: Functions like a local file system with seamless integration to existing applications without business intrusion
- Distributed Design: Same file system can be mounted simultaneously on thousands of servers for high-performance concurrent read/write and data sharing
- Multi-Language Support: Provides Java, Python, Rust, FUSE, and other client implementations
- Multi-OS Support: Supports Linux, Windows, macOS, and other operating systems
- Multi-Architecture Support: Supports x86, ARM, and other hardware architectures
Use Cases​
Curvine is designed for high-performance, high-concurrency, and massive data caching scenarios:
- Deep Learning Training: Provides high-speed data access for deep learning training, significantly reducing data loading time, improving GPU utilization, and accelerating model training processes
- Large Language Model Inference: Optimizes data access for LLM inference scenarios, reducing inference latency and improving model service response speed and throughput
- Analytical Databases and OLAP Engines: Provides high-speed caching for analytical databases and OLAP engines, significantly improving complex query performance and reducing data analysis time
- Big Data Computing: Provides high-speed caching for big data computing scenarios, reducing data read/write time
- Shuffle Data Storage: Stores intermediate results (shuffle) during big data computation, achieving complete separation of compute and storage
- Multi-cloud Data Caching: Improves data access efficiency across clouds and regions