继上一篇继续解析TSM文件格式，上一篇的例子了解了TSM的大概格式，但是由于例子的单一性，还不了解多key, 多field, 不同field类型，不同时间间隔这些方面的压缩特点。

这一篇先从多serieskey的文件入手。

实例分析

先上一份多key的文件

TSM文件格式及实例解析（二）——多serieskey的排列

按照之前了解的结构，先从indexes开始 offset 0x77。

TSM文件格式及实例解析（二）——多serieskey的排列

Indexes部分

index1:
key len = 00 25, 37 bytes
key = cars,brand=bmw,model=320li#!~#mileage
type = 00, float
count = 00 01, 1个block索引
min time = 154EAC802080A926，‭1535354189281012006‬
max time = 154EAC802080A926，1535354189281012006‬
offset = 00 00 00 00 00 00 00 05, 5
datasize = 00 00 00 22, 34bytes

index2:
key_len = 00 22, 34 bytes
key = cars,brand=bmw,model=x5#!~#mileage
type = 00, float
count = 00 01, 1个block索引
min time = 154EAC802080AD0E, 1535354189281013006
max time = 154EAC802080C47E, 1535354189281019006
offset = 0000000000000027, 39
datasize = 00 00 00 2E, 46 bytes

index3:
key_len = 00 25, 37 bytes
key = cars,brand=honda,model=fit#!~#mileage
type = 00,float
count = 00 01, 1个block索引
min time = 154EAC802080A53E, ‭1535354189281011006‬
max time = 154EAC802080A53E, 1535354189281011006
offset = 0000000000000055, 85
datasize = 00 00 00 22, 34 bytes

通过indexes, 可以准确找到对应block的位置

Block部分

TSM文件格式及实例解析（二）——多serieskey的排列

解析一下Blocks

block1, offset 5, datasize 34.
CRC32 = 36 52 0D A3
Block type = 00, float
timestamplen = 09
Encodingtype = 1, PackedSimple
exponent = C
timestamp = 154EAC802080A926, 1535354189281012006
compression type = 10, gorilla paper encoding
第一组8 bytes: 409F400000000000 ,IEEE 754 转换后为2000
剩下C5F7ECE8000000000020 10个Bytes 疑为gorilla 压缩算法造成的冗余

block2, offset 39, datasize 46, 这个block和上一篇中的例子一致。
CRC32 = C24CB839
Block type = 00, float
timestamplen = 0B, 11
Encodingtype = 2, RLE
exponent = 3
starttimestamp = 154EAC802080AD0E, 1535354189281013006
run length value = 01
count = 07
time offset = run length value 1 * 10 ^ exponent 3 = 1000
一共7个timestamp
1535354189281013006
1535354189281014006
1535354189281015006
1535354189281016006
1535354189281017006
1535354189281018006
1535354189281019006
将一一对应后面 7个value

compression type = 10, gorilla paper encoding
下面 24 bytes, gorilla解码读出7个value
2300,2400,2500,2600,2700,2800,2900
剩下 00 00 00 20，疑为算法冗余。

block3，offset 85, datasize 34
CRC32 = 85E142D9
Block type = 00, float
timestamplen = 09
Encodingtype = 1, PackedSimple
exponent = C
timestamp = 154EAC802080A53E, 1535354189281011006
compression type = 10, gorilla paper encoding
第一组8 bytes: 40C3880000000000 ,IEEE 754 转换后为10000
剩下C5F7E771000000000020 10个Bytes 疑为gorilla 压缩算法造成的冗余

综合以上得到 9个points

SeriesKey	timestamp	fieldvalue
cars,brand=bmw,model=320li#!~#mileage	1535354189281012006	2000
cars,brand=bmw,model=x5#!~#mileage	1535354189281010000	2300
cars,brand=bmw,model=x5#!~#mileage	1535354189281010000	2400
cars,brand=bmw,model=x5#!~#mileage	1535354189281015006	2500
cars,brand=bmw,model=x5#!~#mileage	1535354189281010000	2600
cars,brand=bmw,model=x5#!~#mileage	1535354189281017006	2700
cars,brand=bmw,model=x5#!~#mileage	1535354189281018006	2800
cars,brand=bmw,model=x5#!~#mileage	1535354189281019006	2900
cars,brand=honda,model=fit#!~#mileage	1535354189281011006	10000

结论：

indexes的顺序，和blocks的存储顺序是一致的，在每个index中的block索引都是1的情况下，他们是一一对应的。可以视为两列。

TSM文件格式及实例解析（二）——多serieskey的排列

可以猜想，如果是一个key含有多个block索引，也应该是按顺序的。假如key1指向2个blocks, like this

TSM文件格式及实例解析（二）——多serieskey的排列

相似的。在一个block中，timestamp和field value也是存储顺序一致，而且一一对应

如block2中 7个timestamp 和7 values，也可以视为两列。

TSM文件格式及实例解析（二）——多serieskey的排列

如果同一个timestamp有多个value呢？这不可能，那就更新了原来的point的数据了。

小结：

本篇分析了一个多serireskey的实例，得到TSM的并列结构的含义。

下一篇将做多field解析，看看多field value在block中是怎么存储的，如何保持并列结构的。

TSM文件格式及实例解析（二）——多serieskey的排列

实例分析

Indexes部分

Block部分

结论：

小结：

相关推荐