Skip to main content

Observability Construction for Curvine

ยท 7 min read

Curvine as a high-performance distributed cache system has strict requirements for performance, stability, and reliability. To ensure the system maintains optimal performance under various load conditions and to quickly locate and resolve potential issues, we have built a comprehensive monitoring solution based on Prometheus and Grafana. This monitoring system provides deep observability capabilities for Master nodes, Worker nodes, Fuse nodes, and S3 Gateway, enabling real-time monitoring of cache cluster scale, operational status, performance metrics, and resource usage through the collection of key metrics from each component.

Monitoring Architectureโ€‹

This monitoring system adopts the following core components:

  • Prometheus: Responsible for metric collection, storage, and querying
  • Grafana: Provides data visualization and dashboard display

Observability Metricsโ€‹

Master Node Metricsโ€‹

As the cluster's metadata management center, Master nodes provide the following key metrics:

Capacity Metricsโ€‹

Capacity metrics are fundamental for evaluating system storage resource usage, crucial for capacity planning, resource optimization, and preventive maintenance. By monitoring these metrics, storage bottlenecks can be identified in a timely manner, capacity requirements can be predicted, and sufficient space can be ensured to handle business growth.

Capacity Metrics

Metric NameDescription
inode_dir_numNumber of directories
inode_file_numNumber of files
num_blocksTotal number of blocks
blocks_size_avgAverage block size
capacityTotal storage capacity
availableAvailable storage space
fs_usedFile system used space

Resource Metricsโ€‹

Resource metrics reflect the system's usage of computing resources, significant for performance tuning, resource allocation, and fault prevention. Memory usage directly affects system performance and stability, especially for RocksDB as the core storage engine, whose memory usage needs precise monitoring to avoid memory overflow and performance degradation.

Resource Metrics

Metric NameDescription
used_memory_bytesUsed memory in bytes
rocksdb_used_memory_bytesRocksDB memory usage

Cluster Status Metricsโ€‹

Cluster status metrics provide a real-time view of the overall system health, crucial for ensuring high availability and data consistency. By monitoring Worker node status and replication task execution, node failures, data inconsistencies, and other issues can be quickly identified, ensuring reliable operation of the distributed cache system.

Cluster Status

Metric NameDescription
worker_numNumber of workers (classified by status)
replication_staging_numberNumber of blocks waiting for replication
replication_inflight_numberNumber of blocks currently being replicated
replication_failure_countTotal cumulative replication failures

Performance Metricsโ€‹

Performance metrics are core indicators for measuring system responsiveness and processing efficiency, playing a key role in performance optimization and capacity planning. The total count and total time of RPC requests can be used to calculate average response time, directly reflecting system processing capability, while analysis of various operation durations helps identify performance bottlenecks and guide system optimization.

Performance Metrics

Metric NameDescription
rpc_request_total_countTotal RPC request count
rpc_request_total_timeTotal RPC request time
operation_durationOperation duration (classified by type, excluding heartbeat)

Journal System Metricsโ€‹

The Journal system is a key component for ensuring data consistency and fault recovery, and its performance directly affects system write performance and data reliability. Monitoring Journal queue length and flush performance can help identify write bottlenecks in a timely manner, prevent data loss risks, and ensure system stability in high-concurrency write scenarios.

Journal Metrics

Metric NameDescription
journal_queue_lenJournal queue length
journal_flush_countJournal flush count
journal_flush_timeJournal flush time

Client Metrics (Fuse/S3 Gateway)โ€‹

Fuse and S3 Gateway metrics are collected through Client

Cache Metricsโ€‹

Cache metrics directly reflect the core value of the cache system - improving access performance. Mount cache hit rate is a key indicator for measuring cache effectiveness, where high hit rates mean fewer backend accesses and faster response speeds. These metrics are crucial for evaluating cache strategy effectiveness and optimizing cache configuration.

Cache Metrics

Metric NameDescription
client_mount_cache_hitsMount cache hit count
client_mount_cache_missesMount cache miss count

I/O Metricsโ€‹

I/O metrics are core data for evaluating system read/write performance, providing guidance for performance tuning and capacity planning. By monitoring read/write bytes and duration, read/write throughput can be calculated, accurately assessing system I/O performance bottlenecks, optimizing storage strategies, and ensuring stable performance in high-concurrency access scenarios.

I/O Metrics

Metric NameDescription
client_write_bytesWrite bytes
client_write_time_usWrite time (microseconds)
client_read_bytesRead bytes
client_read_time_usRead time (microseconds)

Metadata Operation Metricsโ€‹

Metadata operation performance directly affects file system response speed, crucial for improving user experience and overall system performance. Analysis of metadata operation duration helps identify metadata management bottlenecks, optimize directory structure, and improve file system operation efficiency.

Metadata Operation Metrics

Metric NameDescription
client_metadata_operation_durationMetadata operation duration

Worker Node Metricsโ€‹

As data storage nodes, Worker nodes provide comprehensive storage and performance metrics:

Capacity Metricsโ€‹

Worker node capacity metrics are core to data storage management, playing a key role in load balancing, data migration, and capacity planning. By monitoring storage usage of each node, intelligent data distribution can be achieved, preventing single-point overload and ensuring optimal utilization of storage resources across the entire cache cluster.

Capacity Metrics

Metric NameDescription
capacityTotal storage capacity
availableAvailable storage space
fs_usedFile system used space
num_blocksTotal number of blocks
num_blocks_to_deleteNumber of blocks to be deleted

I/O Metricsโ€‹

Worker node I/O metrics reflect the actual performance of data storage, crucial for evaluating storage hardware efficiency and optimizing data access patterns. Detailed read/write statistics help identify hot data, optimize data layout, and improve overall storage performance and response speed.

I/O Metrics

Metric NameDescription
write_bytesWrite bytes
write_time_usWrite time (microseconds)
write_countWrite count
write_blocksWrite blocks (classified by type)
read_bytesRead bytes
read_time_usRead time (microseconds)
read_countRead count
read_blocksRead blocks (classified by type)

Resource Metricsโ€‹

Worker node resource usage directly affects the stability and performance of data storage services, significant for resource scheduling and performance optimization. Reasonable memory usage is the foundation for ensuring data cache efficiency and needs precise monitoring to avoid resource competition and performance degradation.

Resource Metrics

Metric NameDescription
used_memory_bytesUsed memory in bytes

Hardware Status Metricsโ€‹

Hardware status metrics are an important monitoring dimension for ensuring data reliability and system availability, crucial for preventive maintenance and rapid fault response. By monitoring disk health status in real-time, hardware failure risks can be identified early, enabling timely data migration and hardware replacement, ensuring data security and continuous availability of the cache system.

Hardware Status Metrics

Metric NameDescription
failed_disksNumber of failed storage devices
total_disksTotal number of storage disks

Global Dashboardsโ€‹

Masterโ€‹

Master

Workerโ€‹

Worker

Clientโ€‹

Client

Summaryโ€‹

The observability design of the Curvine distributed cache system covers the complete chain from metadata management to data storage, achieving the following through fine-grained metric collection:

  • End-to-End Monitoring: Complete monitoring from client requests to data storage, ensuring performance and status of each link are observable
  • Multi-Dimensional Observation: Covering multiple dimensions including performance, capacity, and status, providing a comprehensive view of system health
  • Real-Time Alerting: Real-time monitoring and alerting based on key metrics, enabling timely detection of anomalies and rapid response
  • Fault Diagnosis: Detailed metric data supports rapid fault location, reducing fault recovery time
  • Performance Optimization: Continuous monitoring and analysis provide data support for system performance tuning
  • Capacity Planning: Based on historical trends and real-time data, providing decision-making basis for capacity expansion and resource optimization

Through this comprehensive monitoring system, Curvine can maintain high availability and high performance in complex distributed environments, providing users with stable and reliable cache services.

Curvine Distributed Cache System User Guide

ยท 12 min read

Version License Documentation

๐Ÿ“š Table of Contentsโ€‹


๐ŸŽฏ System Overviewโ€‹

Curvine is a high-performance, cloud-native distributed caching system designed for modern data-intensive applications. It provides an intelligent caching layer between underlying storage (UFS) and compute engines, significantly improving data access performance.

๐Ÿ† Performance Advantagesโ€‹

Compared to traditional storage access methods, Curvine can provide:

MetricCloud StorageCurvine CachePerformance Improvement
Read Latency100-500ms1-10ms10-50x
Throughput100-500 MB/s1-10 GB/s10-20x
IOPS1K-10K100K-1M100x
Concurrent Connections100-1K10K-100K100x

Core Componentsโ€‹

  • Master Cluster: Metadata management, cache scheduling, consistency guarantees
  • Worker Nodes: Data caching, I/O processing, task execution
  • Client SDK: Multi-language clients, supporting Rust, Fuse, Java, Python
  • Job Manager: Distributed task scheduling and management
  • Metrics System: Real-time monitoring and performance analysis

๐Ÿ“‚ Path Mount Managementโ€‹

Mounting is the first step in using Curvine cache, which establishes the mapping relationship between underlying storage (UFS) and cache paths.

Mounting Modes Explainedโ€‹

Curvine supports two flexible mounting modes:

๐ŸŽฏ CST Mode (Consistent Path Mode)โ€‹

# Consistent path, easy to manage and maintain
bin/cv mount s3://bucket/data /bucket/data --mnt-type cst

Ideal scenarios:

  • Data lake scenarios with clear path structures
  • Production environments requiring intuitive path mapping
  • Data platforms with multi-team collaboration

๐Ÿ”€ Arch Mode (Orchestration Mode)โ€‹

# Flexible path mapping, supporting complex path transformations
bin/cv mount s3://complex-bucket/deep/nested/path /simple/data --mnt-type arch

Ideal scenarios:

  • Complex storage hierarchies
  • Scenarios requiring path abstraction
  • Multi-cloud storage unified access

Complete Mounting Exampleโ€‹

# Mount S3 storage to Curvine (production-grade configuration)
bin/cv mount \
s3://bucket/warehouse/tpch_500g.db/orders \
/bucket/warehouse/tpch_500g.db/orders \
--ttl-ms 24h \
--ttl-action delete \
--replicas 3 \
--block-size 128MB \
--consistency-strategy always \
--storage-type ssd \
-c s3.endpoint_url=https://s3.ap-southeast-1.amazonaws.com \
-c s3.credentials.access=access_key \
-c s3.credentials.secret=secret_key \
-c s3.region_name=ap-southeast-1

Mounting Parameters Explainedโ€‹

ParameterTypeDefaultDescriptionExample
--ttl-msduration0Cache data expiration time24h, 7d, 30d
--ttl-actionenumnoneExpiration policy: delete/nonedelete
--replicasint1Number of data replicas (1-5)3
--block-sizesize128MBCache block size64MB, 128MB, 256MB
--consistency-strategyenumalwaysConsistency strategynone/always/period
--storage-typeenumdiskStorage medium typemem/ssd/disk

Mount Point Managementโ€‹

# View all mount points
bin/cv mount

# Unmount path
bin/cv unmount /bucket/warehouse/tpch_500g.db/orders

๐Ÿ’พ Intelligent Caching Strategiesโ€‹

Curvine provides multiple intelligent caching strategies, from passive response to active prediction, comprehensively optimizing data access performance.

Active Data Preloadingโ€‹

Active loading allows you to warm up the cache before business peaks to ensure optimal performance:

# Basic loading
bin/cv load s3:/bucket/warehouse/critical-dataset

# Synchronous loading with progress monitoring
bin/cv load s3:/bucket/warehouse/critical-dataset -w

Automatic Caching Strategyโ€‹

Curvine's automatic caching system has significant advantages over traditional solutions:

โœจ Curvine Intelligent Cache Architectureโ€‹

curvine

Core Advantage Comparisonโ€‹

FeatureOpen Source CompetitorsCurvineAdvantage Description
Loading GranularityBlock-levelFile/Directory-levelAvoid fragmentation, ensure integrity
Duplicate ProcessingExists duplicate loadingIntelligent deduplicationSave bandwidth and storage resources
Task SchedulingSimple queueDistributed Job ManagerEfficient concurrency, load balancing
Consistency GuaranteePassive checkingActive awarenessReal-time data synchronization

๐Ÿ”„ Data Consistency Guaranteesโ€‹

Data consistency is a core challenge for caching systems, and Curvine provides multi-level consistency guarantee mechanisms.

Consistency Strategy Detailsโ€‹

1. ๐Ÿšซ None Mode - Highest Performanceโ€‹

bin/cv mount s3://bucket/path /bucket/path --consistency-strategy=none
  • Ideal scenarios: Static data, archived data, read-only datasets
  • Performance: โญโญโญโญโญ (fastest)
  • Consistency: โญโญ (TTL-dependent)

2. โœ… Always Mode - Strong Consistencyโ€‹

bin/cv mount s3://bucket/path /bucket/path --consistency-strategy=always
  • Ideal scenarios: Frequently updated business data, critical business systems
  • Performance: โญโญโญ (has overhead)
  • Consistency: โญโญโญโญโญ (strong consistency)

3. ๐Ÿ•ฐ๏ธ Period Mode - Balanced Solutionโ€‹

bin/cv mount s3://bucket/path /bucket/path \
--consistency-strategy=period \
--check-interval=5m
  • Ideal scenarios: Data with predictable update frequency
  • Performance: โญโญโญโญ (good)
  • Consistency: โญโญโญโญ (periodically guaranteed)

Cache Performance Monitoringโ€‹

Monitoring cache hit ratio is an important way to evaluate the effectiveness of consistency strategies:

# Get cache hit ratio
curl -s http://master:9001/metrics | grep -E "(cache_hits|cache_misses)"
client_mount_cache_hits{id="3108497238"} 823307
client_mount_cache_misses{id="3108497238"} 4380

๐Ÿค– AI/ML Scenario Applicationsโ€‹

AI and machine learning workloads have extremely high requirements for storage performance, and Curvine provides specially optimized functions for this.

Deep Learning Training Optimizationโ€‹

# Optimized data loading for GPU clusters
bin/cv mount s3://datasets/imagenet /datasets/imagenet \
--storage-type=mem \
--block-size=1GB \
--replicas=2

Model Serving Scenariosโ€‹

# Model file caching (low-latency access)
bin/cv mount s3://model/bert-large /models/bert-large \
--storage-type=mem \
--ttl-ms=none \
--consistency-strategy=always

# Inference data caching
bin/cv mount s3://inference/input /inference/input \
--storage-type=ssd \
--ttl-ms=1h \
--consistency-strategy=none

๐Ÿ”— POSIX Semantics and FUSE Accessโ€‹

Curvine perfectly supports POSIX semantics through the FUSE (Filesystem in Userspace) interface, allowing the Curvine cluster to be mounted as a local file system, providing a transparent file access experience for AI/ML applications.

FUSE Usage in AI/ML Trainingโ€‹

1. Deep Learning Training Data Accessโ€‹
# PyTorch training script
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os

class CurvineImageDataset(Dataset):
def __init__(self, root_dir, transform=None):
"""
Directly access data in Curvine through FUSE mount point
root_dir: FUSE mount point path, such as /curvine-fuse/datasets/imagenet
"""
self.root_dir = root_dir
self.transform = transform
self.image_paths = []

# Directly traverse the FUSE-mounted directory
for class_dir in os.listdir(root_dir):
class_path = os.path.join(root_dir, class_dir)
if os.path.isdir(class_path):
for img_file in os.listdir(class_path):
if img_file.lower().endswith(('.png', '.jpg', '.jpeg')):
self.image_paths.append(os.path.join(class_path, img_file))

def __len__(self):
return len(self.image_paths)

def __getitem__(self, idx):
# Access data through standard file operations, enjoying Curvine cache acceleration
img_path = self.image_paths[idx]
image = Image.open(img_path).convert('RGB')

if self.transform:
image = self.transform(image)

# Extract label from path
label = os.path.basename(os.path.dirname(img_path))
return image, label

# Usage example
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])

# Directly use the path of the FUSE mount point
dataset = CurvineImageDataset(
root_dir='/curvine-fuse/datasets/imagenet/train',
transform=transform
)

dataloader = DataLoader(
dataset,
batch_size=64,
shuffle=True,
num_workers=8,
pin_memory=True
)

# Training loop
for epoch in range(num_epochs):
for batch_idx, (data, targets) in enumerate(dataloader):
# Data is automatically loaded from Curvine cache through FUSE
# Enjoy near-memory access speed
outputs = model(data.cuda())
loss = criterion(outputs, targets.cuda())
# ... training logic
2. TensorFlow/Keras Data Pipelineโ€‹
import tensorflow as tf
import os

def create_curvine_dataset(data_dir, batch_size=32):
"""
Create TensorFlow data pipeline through FUSE mount point
data_dir: FUSE-mounted data directory
"""

# Directly access FUSE-mounted data using standard file APIs
def load_and_preprocess_image(path):
# TensorFlow transparently accesses Curvine cache through FUSE
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [224, 224])
image = tf.cast(image, tf.float32) / 255.0
return image

# Scan files in the FUSE-mounted directory
image_paths = []
labels = []

for class_name in os.listdir(data_dir):
class_dir = os.path.join(data_dir, class_name)
if os.path.isdir(class_dir):
for img_file in os.listdir(class_dir):
if img_file.lower().endswith(('.png', '.jpg', '.jpeg')):
image_paths.append(os.path.join(class_dir, img_file))
labels.append(class_name)

# Create dataset
path_ds = tf.data.Dataset.from_tensor_slices(image_paths)
label_ds = tf.data.Dataset.from_tensor_slices(labels)

# Apply preprocessing
image_ds = path_ds.map(
load_and_preprocess_image,
num_parallel_calls=tf.data.AUTOTUNE
)

# Combine data and labels
dataset = tf.data.Dataset.zip((image_ds, label_ds))

return dataset.batch(batch_size).prefetch(tf.data.AUTOTUNE)

# Usage example
train_dataset = create_curvine_dataset('/curvine-fuse/datasets/imagenet/train')
val_dataset = create_curvine_dataset('/curvine-fuse/datasets/imagenet/val')

# Model training
model.fit(
train_dataset,
validation_data=val_dataset,
epochs=50,
callbacks=[
tf.keras.callbacks.ModelCheckpoint('/curvine-fuse/models/checkpoints/'),
tf.keras.callbacks.TensorBoard(log_dir='/curvine-fuse/logs/')
]
)

๐Ÿ—„๏ธ Big Data Ecosystem Integrationโ€‹

Curvine seamlessly integrates with mainstream big data frameworks, providing transparent cache acceleration capabilities.

Hadoop Ecosystem Integrationโ€‹

Basic Configurationโ€‹

Add in hdfs-site.xml or core-site.xml:

<!-- Curvine FileSystem implementation -->
<property>
<name>fs.cv.impl</name>
<value>io.curvine.CurvineFileSystem</value>
</property>

<!-- Single cluster configuration -->
<property>
<name>fs.cv.master_addrs</name>
<value>master1:8995,master2:8995,master3:8995</value>
</property>

Multi-cluster Supportโ€‹

<!-- Cluster 1: Production environment -->
<property>
<name>fs.cv.production.master_addrs</name>
<value>prod-master1:8995,prod-master2:8995,prod-master3:8995</value>
</property>

<!-- Cluster 2: Development environment -->
<property>
<name>fs.cv.development.master_addrs</name>
<value>dev-master1:8995,dev-master2:8995</value>
</property>

<!-- Cluster 3: Machine learning dedicated cluster -->
<property>
<name>fs.cv.ml-cluster.master_addrs</name>
<value>ml-master1:8995,ml-master2:8995,ml-master3:8995</value>
</property>

๐Ÿ”„ UFS Transparent Proxyโ€‹

To better support existing Java applications to seamlessly access Curvine cache, we provide a UFS transparent proxy solution. The core advantage of this solution is zero code modification, allowing existing applications to immediately enjoy the cache acceleration effects of Curvine.

โœจ Core Features of Transparent Proxyโ€‹

  • ๐Ÿšซ Zero code modification: Preserves all original interfaces unchanged, no business code modifications required
  • ๐Ÿ” Intelligent path recognition: Only determines whether the path has been mounted to Curvine when opening a file
  • โšก Automatic cache acceleration: Automatically enables cache acceleration for mounted paths, native S3 access for unmounted paths
  • ๐Ÿ”„ Smooth switching: Supports dynamically switching whether to use cache at runtime without restarting the application

๐Ÿ› ๏ธ Configuration Methodโ€‹

Simply replace the S3FileSystem implementation class in Hadoop configuration:

<!-- Traditional S3 access configuration -->
<!--
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
-->

<!-- Replace with Curvine transparent proxy -->
<property>
<name>fs.s3a.impl</name>
<value>io.curvine.S3AProxyFileSystem</value>
</property>

<property>
<name>fs.cv.impl</name>
<value>io.curvine.CurvineFileSystem</value>
</property>

<!-- Curvine cluster configuration -->
<property>
<name>fs.curvine.master_addrs</name>
<value>master1:8995,master2:8995,master3:8995</value>
</property>

๐Ÿ”ง Working Principleโ€‹

Working Principle

๐Ÿš€ Usage Exampleโ€‹

No need to modify any business code, original code directly enjoys acceleration:

// Business code remains completely unchanged!
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create("s3a://my-bucket/"), conf);

// If this path is mounted to Curvine, automatically enjoy cache acceleration
FSDataInputStream input = fs.open(new Path("s3a://my-bucket/warehouse/data.parquet"));

// If this path is not mounted, use native S3 access
FSDataInputStream input2 = fs.open(new Path("s3a://my-bucket/archive/old-data.parquet"));

Spark/MapReduce code example:

// Spark code does not need any modification
Dataset<Row> df = spark.read()
.option("header", "true")
// If /warehouse/ path is mounted, automatically use cache acceleration
.csv("s3a://data-lake/warehouse/customer_data/");

df.groupBy("region")
.agg(sum("revenue").alias("total_revenue"))
.orderBy(desc("total_revenue"))
.show(20);

Python PySpark example:

# Python code also does not need modification
from pyspark.sql import SparkSession
from pyspark.sql.functions import sum, desc

spark = SparkSession.builder.appName("TransparentCache").getOrCreate()

# Automatically determine whether to use cache
df = spark.read \
.option("header", "true") \
.csv("s3a://data-lake/analytics/events/")

result = df.groupBy("event_type") \
.agg(sum("count").alias("total_events")) \
.orderBy(desc("total_events"))

result.show()

Apache Spark Optimization Configurationโ€‹

# Spark application startup configuration
spark-submit \
--class com.example.SparkApp \
--master yarn \
--deploy-mode cluster \
--conf spark.hadoop.fs.cv.impl=io.curvine.CurvineFileSystem \
--conf spark.hadoop.fs.cv.master_addrs=master1:8995,master2:8995,master3:8995 \
--conf spark.sql.adaptive.enabled=true \
--jars curvine-hadoop-client.jar \
app.jar

Spark Code Exampleโ€‹

// Scala example
val spark = SparkSession.builder()
.appName("Curvine Demo")
.config("spark.hadoop.fs.cv.impl", "io.curvine.CurvineFileSystem")
.getOrCreate()

// Directly use cv:// protocol to access cached data
val df = spark.read
.option("multiline", "true")
.json("cv://production/warehouse/events/2024/01/01/")

df.groupBy("event_type")
.count()
.show()

// Multi-cluster access
val prodData = spark.read.parquet("cv://production/warehouse/sales/")
val mlData = spark.read.parquet("cv://ml-cluster/features/user_profiles/")
# Python example
from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName("Curvine Python Demo") \
.config("spark.hadoop.fs.cv.impl", "io.curvine.CurvineFileSystem") \
.config("spark.hadoop.fs.cv.master_addrs", "master1:8995,master2:8995") \
.getOrCreate()

# Read data from cache
df = spark.read.option("header", "true") \
.csv("cv://warehouse/customer_data/")

# Complex queries automatically enjoy cache acceleration
result = df.groupBy("region") \
.agg({"revenue": "sum", "orders": "count"}) \
.orderBy("sum(revenue)", ascending=False)

result.show(20)

Trino/Presto Plugin Integrationโ€‹

Curvine provides an intelligent path replacement plugin, which can achieve non-invasive cache acceleration without requiring business code modifications, achieving completely transparent cache acceleration:

Plugin Workflowโ€‹

Plugin Workflow

Spark plugin usage example:

spark-submit \
--class main.scala.Tpch \
--name tpch_demo \
--conf spark.hadoop.fs.cv.impl=io.curvine.CurvineFileSystem \
--conf spark.hadoop.fs.cv.default.master_addrs=master1:8995,master2:8995 \
--conf spark.sql.extensions=io.curvine.spark.CurvineSparkExtension \
// Flink Table API integration example
TableEnvironment tableEnv = TableEnvironment.create(settings);

// Configure Curvine FileSystem
Configuration config = new Configuration();
config.setString("fs.cv.impl", "io.curvine.CurvineFileSystem");
config.setString("fs.cv.master_addrs", "master1:8995,master2:8995");

// Create Curvine table
tableEnv.executeSql(
"CREATE TABLE user_events (" +
" user_id BIGINT," +
" event_type STRING," +
" timestamp_ms BIGINT," +
" properties MAP<STRING, STRING>" +
") WITH (" +
" 'connector' = 'filesystem'," +
" 'path' = 'cv://streaming/events/'," +
" 'format' = 'json'" +
")"
);

// Real-time query enjoys cache acceleration
Table result = tableEnv.sqlQuery(
"SELECT user_id, COUNT(*) as event_count " +
"FROM user_events " +
"WHERE timestamp_ms > UNIX_TIMESTAMP() * 1000 - 3600000 " +
"GROUP BY user_id"
);


๐Ÿ’ก Best Practicesโ€‹

๐ŸŽฏ Mounting Strategy Best Practicesโ€‹

Tiered Mounting by Business Scenariosโ€‹

# Hot data: high-frequency access, using memory cache
bin/cv mount s3://bucket/hot /bucket/hot \
--storage-type=mem \
--replicas=3 \
--ttl-ms=1d \
--ttl-action=delete

# Warm data: regular access, using SSD cache
bin/cv mount s3://bucket/warm /bucket/warm \
--storage-type=ssd \
--replicas=2 \
--ttl-ms=7d \
--ttl-action=delete


# Cold data: low-frequency access, using disk cache
bin/cv mount s3://bucket/cold /bucket/cold \
--storage-type=disk \
--replicas=1 \
--ttl-ms=30d \
--ttl-action=delete

Optimization by Data Typeโ€‹

# Small file intensive (e.g., logs, configurations)
bin/cv mount s3://bucket/logs /bucket/logs \
--block-size=4MB \
--consistency-strategy=none

# Large file type (e.g., videos, models)
bin/cv mount s3://bucket/models /bucket/models \
--block-size=1GB \
--consistency-strategy=always

# Analytical data (e.g., Parquet)
bin/cv mount s3://bucket/analytics /bucket/analytics \
--block-size=128MB \
--consistency-strategy=none \

๐ŸŽฏ Summaryโ€‹

As a new generation distributed caching system, Curvine provides excellent performance improvements for modern data-intensive applications through intelligent caching strategies, strong consistency guarantees, and seamless ecosystem integration.

๐Ÿ† Core Valuesโ€‹

  • ๐Ÿš€ Performance Improvement: 10-100x access acceleration, significantly reducing data access latency
  • ๐Ÿ’ฐ Cost Optimization: Reduce cloud storage access costs, improve computing resource utilization
  • ๐Ÿ›ก๏ธ Data Security: Multiple consistency guarantees to ensure data accuracy and integrity
  • ๐ŸŒ Ecosystem Friendly: Seamless integration with mainstream big data and AI frameworks

Curvine - Make data access lightning fast โšก

Building a Curvine Cluster from Scratch & FIO Testing

ยท 2 min read
Founder of Curvine

How to quickly get started and try out Curvine's performance? This article will introduce how to build a local small cluster from scratch, allowing everyone to get hands-on experience quickly.

GitHub: https://github.com/CurvineIO/curvine


1. Download the Code:โ€‹

git clone https://github.com/CurvineIO/curvine.git

2. Environment Requirements:โ€‹

GCC: version 10 or later 
Rust: version 1.86 or later
Protobuf: version 3.x
Maven: version 3.8 or later
LLVM: version 12 or later
FUSE: libfuse2 or libfuse3 development packages
JDK: version 1.8 or later
npm: version 9 or later
Python: version 3.7 or later

3. Compile & Runโ€‹

make all

To facilitate compilation, our build script will check dependencies in advance. For macOS users, we will temporarily skip FUSE compilation (currently not adapted for macOS). Interested users can consider using the macfuse project for adaptation.

make-checkenv

4. After Compilation, Start Local Clusterโ€‹

cd build/dist
./bin/restart-all.sh

After successful startup, execute the report command to check if it's working:


bin/cv report

active_master: localhost:8995
journal_nodes: 1,localhost:8996
capacity: 233.5GB
available: 105.0GB (44.99%)
fs_used: 0.0B (0.00%)
non_fs_used: 128.4GB
live_worker_num: 1
lost_worker_num: 0
inode_num: 0
block_num: 0
live_worker_list: 192.168.xxx.xxx:8997,105.0GB/233.5GB (44.99%)
lost_worker_list:

5. View Local Master and Worker WebUIโ€‹

http://localhost:9000/
http://localhost:9001/

webui

6. FIO Testingโ€‹

Test Environment: Alibaba Cloud ecs.r8a.8xlarge instance with one master/worker/client each

  • 32 cores (vCPU)
  • 256 GiB memory
  • System disk and data disk both: ESSD cloud disk 500 GiB (7800 IOPS)
  • Maximum bandwidth: 25Gb

Prepare data (on worker machine):

bin/curvine-bench.sh fuse.write

FIO Sequential Read Test, 8 Concurrent Jobs

fio -iodepth=1 -rw=read -ioengine=libaio -bs=256k
-group_reporting -size=200gb
-filename=/curvine-fuse/fs-bench/0
-name=read_test --readonly -direct=1 --runtime=60
-numjobs=8

FIO Random Read Test, 8 Concurrent Jobs


fio -iodepth=1 -rw=randread -ioengine=libaio -bs=256k
-group_reporting -size=200gb
-filename=/curvine-fuse/fs-bench/0
-name=read_test --readonly -direct=1 --runtime=60
-numjobs=8

Finally, here's a video demonstration of the FIO testing results: