[1]CarbonData Introduction And Docs 笔记
CarbonData
Apache CarbonData is a new big data file format for faster interactive query using:
- advanced columnar storage,
- index,
- compression
- encoding
techniques to improve computing efficiency, which helps in speeding up queries by an order of magnitude faster over PetaBytes of data.
CarbonData Introduction
-
Unique Data Organization
(1) Stores data in Columnar format.
(2) with each Data Block(row group) sorted independent of the other to allow faster filtering and better compression -
Multi Level Indexing
Utilizes multiple indices at various levels to enable faster search and speeding up query processing. -
Seamless Integration with Big Data Eco-System.
Deep Spark Integration with DataFrame & SQL compliance -
Advanced PushDown Optimizations.
Pushes much of query processing close to the data to minimize the amount of data being read, processed, converted, transmitted and shuffled. -
Dictionary Encoding.
Encoded data for reduced storage space and faster processing.