Skip to main content

starrocks-compute-storage-separation-curvine

· 4 min read
Founder of Curvine

In cloud deployments, compute-storage separation is an increasingly popular choice: object storage cuts cost and compute can scale independently. The trade-off many teams worry about is performance. How much slower is separation, and can a cache layer like Curvine make it competitive with colocated storage?

We ran a standard TPC-DS 100GB benchmark on StarRocks, comparing colocated (SSD/HDD) and compute-storage separation (Curvine + OSS, and raw OSS) under identical compute. Here’s what we found — and how a single parameter change eliminated an 8× slow-query gap.


1. Why This Benchmark?

We wanted to measure the impact of storage architecture and disk type on StarRocks query performance. Same workload, same hardware tier: 99 TPC-DS queries across 24 tables, with total runtime and average latency as the main metrics.

Environment (summary):

ItemDetails
ComputeAlibaba Cloud ECS r6.8xlarge (32 vCPU, 256 GB RAM), 3-node cluster for all setups
Configurations5: colocated SSD, colocated HDD, separation + Curvine SSD, separation + Curvine HDD, separation + raw OSS
DatasetTPC-DS 100 GB, 99 standard complex queries

2. Main Takeaways: Four Findings

1. Architecture matters more than media.
Colocated storage consistently outperformed compute-storage separation, mainly because it avoids cross-node data transfer.

2. Curvine cache narrows the separation gap.
With separation, Curvine on SSD (362 s total) was ~16% faster than raw OSS (419 s). In on-prem or hybrid setups, using Curvine in front of object storage gives a clear performance win.

3. SSD wins across the board.
For both colocated and Curvine-based separation, SSD beat HDD, especially on large scans.

4. Raw OSS: best cost/performance for non–latency-critical workloads.
Raw OSS was the slowest but still only ~41% slower than the fastest colocated-SSD run. Given its low cost and elasticity, it remains a strong option when peak latency is not the main concern.


3. Results: Side-by-Side Numbers

MetricColocated (SSD)Colocated (HDD)Separation (Curvine SSD)Separation (Curvine HDD)Separation (OSS)
Total time297.96 s313.68 s361.86 s385.20 s419.15 s
RelativeBaseline (100%)+5.3%+21.4%+29.3%+40.7%
Avg per query3.01 s3.17 s3.66 s3.89 s4.23 s
Slowest query52.94 s54.40 s54.87 s55.19 s52.66 s

In short: colocated SSD is fastest (under 300 s total); Curvine-based separation closes much of the gap; and even raw OSS stays at ~4.23 s average per query, which is acceptable for many use cases.


4. Fixing Slow Queries: Two Case Studies

Two queries (Q05 and Q77) were 5–8× slower in the separation setup. Root-cause analysis led to a small change that brought them in line with colocated performance.

Query 05: Why 8× slower?

  • Observed: ~15 s (separation) vs ~1.8 s (colocated).
  • Cause: A 3-table union produced ~275M rows (15.33 GB); separation had to pull this across the network (~10 s of the 15 s).
  • Takeaway: In separation, data is read remotely and then transferred; colocated can read locally.

Query 77: Full-scan trap

  • Observed: ~11 s (separation) vs ~1.8 s (colocated).
  • Cause: Full scan of ~288M rows (14.43 GB); network I/O for remote reads was far slower than local SSD. The optimizer’s “pre-aggregation” strategy also disabled “dynamic filtering,” so unnecessary data was shipped.
  • Fix: Turn off aggregate pushdown so dynamic filtering can kick in:
SET cbo_push_down_aggregate_mode = -1;

Effect: Rows scanned dropped from 275M to 6.11M, and latency matched the colocated run.


5. Summary and Recommendations

GoalRecommendation
Highest performanceColocated SSD — for real-time analytics and high concurrency.
On-prem / cost-sensitiveCurvine-based compute-storage separation (SSD backend when possible) — good balance of performance and elasticity.
Public cloud, large storageRaw OSS separation — best cost efficiency for non–real-time analytics.
Already on separationApply the optimizer setting above to avoid the slow-query pattern we hit; expect large gains on similar queries.

Compute-storage separation does not have to mean “slow.” With Curvine as a high-performance cache layer and a few targeted optimizations, StarRocks can deliver strong query performance while keeping the cost and elasticity benefits of object storage.


Original article (Chinese): 实测!StarRocks存算分离性能大揭秘:Curvine缓存竟有这波神操作