Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadV2 in TsFile #144

Merged
merged 34 commits into from
Mar 7, 2018
Merged

ReadV2 in TsFile #144

merged 34 commits into from
Mar 7, 2018

Conversation

xingtanzjr
Copy link
Collaborator

@xingtanzjr xingtanzjr commented Jan 30, 2018

}

private String getUUID() {
return filePath + offsetInFile + lengthOfBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offsetInFile和lengthOfBytes都是long,直接连接起来是不是不容易分开,需要考虑加一个分隔符吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个生成方法不准确,比如offsetInFile=101,lengthOfBytes=0和offsetInFile=10,lengthOfBytes=10的情况是一样的
还是判断这三者是否相等吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,加一个分隔符

/**
* Created by zhangjinrui on 2017/12/25.
*/
public interface MetadataQuerier {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的Metadata是特指seriesChunk的metadata,对吗

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个Metadata泛指TsFile末尾的metadata,但是这个接口的作用主要是给定一个Path,返回该Path的SeriesChunkMetadataList。当前在TsFile中,实现的是返回一个文件中对应Path的List

randomAccessFileReader.read(buf, 0, buf.length);

ByteArrayInputStream metadataInputStream = new ByteArrayInputStream(buf);
this.fileMetaData = new TsFileMetaDataConverter().toTsFileMetadata(ReadWriteThriftFormatUtils.readFileMetaData(metadataInputStream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我对这种new一个对象只为调用它的一个方法的做法表示疑问,或许实现为静态方法是个更好的选择。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MyXOF 这个类你看一下?如果该类在调用方法的时候,不涉及class级别的变量的话,是可以用static的

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之后应该会去掉thrift,这个暂时不改了

try {
return seriesChunkDescriptorCache.get(path);
} catch (CacheException e) {
throw new IOException("Get SeriesChunkDescriptorList Error.", e);
Copy link
Contributor

@jt2594838 jt2594838 Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我注意到这里似乎存在IOException -> CacheException -> IOException的过程,那么中间的这一步转换是否必要?
另外,Exception message中如果带上参数(path),可能会比较助于调试。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实存在这个转换,我也有点疑问,就是说有没有什么好办法让这个地方更优雅一点。这么设计的主要原因是,Cache作为一个独立的类,他的异常不能是IOException,这个应用场景下会发生这种转换,其他应用场景下,转换可能就是AException -> CacheException -> BException了。

Path加在ErrorMsg里面能更清晰,加一下

return encodedSeriesChunkDescriptorList;
}

private EncodedSeriesChunkDescriptor generateSeriesChunkDescriptorByMetadata(TimeSeriesChunkMetaData timeSeriesChunkMetaData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EncodedSeriesChunkDescriptor为何不直接提供一个基于TimeSeriesChunkMetaData的构造函数?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个提议不错,当时没考虑到。

目前这么实现的一个原因是,TimeSeriesChunkMetadata里面没有filePath这个属性,为了照顾这个点,所以提供了好几个不同的构造函数,直接传入了对应变量的值。

* Created by zhangjinrui on 2017/12/26.
*/
public interface SeriesChunkLoader {
SeriesChunk getMemSeriesChunk(EncodedSeriesChunkDescriptor encodedSeriesChunkDescriptor) throws IOException;
Copy link
Contributor

@jt2594838 jt2594838 Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数名中的Mem是指什么?是指返回对象将被放到内存中还是函数将从内存中获取这个对象?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1,我在前面也对这个命名有点疑问

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemSeriesChunk是指被加载到内存中的SeriesChunk,即对应SeriesChunk的所有bytes被加载到内存,并且存储在该对象中

byte[] buf = new byte[seriesChunkLength];
randomAccessFileReader.seek(encodedSeriesChunkDescriptor.getOffsetInFile());
int readLength = randomAccessFileReader.read(buf, 0, seriesChunkLength);
if (readLength != seriesChunkLength) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我似乎看到有些地方做了这个检查,有些地方没有。

Copy link
Collaborator Author

@xingtanzjr xingtanzjr Feb 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据之前的经验,这个地方是要加这个判断的,因为有时候系统读出来的字节数会有问题(当年遇到过这个坑)

@@ -42,4 +44,8 @@ public boolean equals(Object object) {
}
return false;
}

public int getSize() {
return 4 + 8 + value.getSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个4指的是?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

估计是我弄错了,对象引用是几个字节来着?

Copy link
Contributor

@jt2594838 jt2594838 Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不开启引用压缩的情况下是8个,开启引用压缩的情况(大部分)是4个。我不清楚我们有没有开启引用压缩。开启引用压缩会造成一定的性能下降。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有结论吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用8个

public class QueryExpression {
private List<Path> selectedSeries;
private QueryFilter queryFilter;
private boolean hasQueryFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasQueryFilter和queryFilter == null是等价的吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

理论是上,但是queryFilter == null这是一种潜在的约定,感觉不是很规范。加了一个hasQueryFIlter()方法会更规范,假如某一天修改了hasQueryFIlter()不会对使用该类的其他类产生未知的影响。

/**
* Created by zhangjinrui on 2017/12/26.
*/
public class QueryDataSetForQueryWithQueryFilterImpl implements QueryDataSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

很想吐槽这个名字。。。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

说实话我也想吐槽,当时没想出更好的名字,有啥推荐的名字没,比如QueryWithFilterDataSet?

if (regularQueryFilter instanceof GlobalTimeFilter) {
return new QueryWithGlobalTimeFilterExecutorImpl(seriesChunkLoader, metadataQuerier).execute(queryExpression);
} else {
return new QueryWithQueryFilterExecutorImpl(seriesChunkLoader, metadataQuerier).execute(queryExpression);
Copy link
Contributor

@jt2594838 jt2594838 Jan 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同样地,一个对象被创建仅仅是为了调用它的一个方法。
也许把seriesChunkLoader, metadataQuerier,queryExpression合起来做成一个静态函数可以达到同样的效果。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有道理。

这个地方这么设计的一个原因是,没有把excutor看做是一个执行引擎,而是一个“执行者”这样的概念,它其实是可以创建一次,执行多次的,而且内部可能需要维护一些当前执行状态的属性,只不过在这个地方都没有用到

//LOG.info(jsonSchema.toString());
FileSchema schema = new FileSchema(jsonSchema);
TSFileDescriptor.getInstance().getConfig().groupSizeInByte = rowGroupSize;
TSFileDescriptor.getInstance().getConfig().maxNumberOfPointsInPage = pageSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个配置项似乎没有改回去,不会对其他测试造成影响吗?

Copy link
Collaborator Author

@xingtanzjr xingtanzjr Feb 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我改一下

while (timestampGenerator.hasNext()) {
// System.out.println(timestampGenerator.next());
Assert.assertEquals(startTimestamp, timestampGenerator.next());
startTimestamp += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要验证startTimestamp的结束值吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,需要验证一下,不然玩意hasNext直接返回false,UT测不出来

}

private String getUUID() {
return filePath + offsetInFile + lengthOfBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个生成方法不准确,比如offsetInFile=101,lengthOfBytes=0和offsetInFile=10,lengthOfBytes=10的情况是一样的
还是判断这三者是否相等吧

hasQueryFilter = false;
}

public static QueryExpression create() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private的构造函数,static的构造方法有必要吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一种创建表达式的写法风格...感觉应该还好

cachedLeftValue = leftValue;
return rightValue;
} else {
return leftValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里二者相等的话,是不是也要向上面那样,hasCachedLeftValue和hasCachedRightValue都设为true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个逻辑不是这样的,cache某个值的意思是这个值已经被拿出来了,但是还没弄到它。当left==right的时候,说明两个时间戳相等,需要合并为一个时间戳,返回之后谁都不需要被cache


@Override
public int compareTo(Point o) {
return timestamp - o.timestamp > 0 ? 1 : timestamp - o.timestamp < 0 ? -1 : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这我看晕了

throw new IOException("No more timeValuePair in current MemSeriesChunk");
}

private boolean constructPageReaderIfNextSatisfiedPageExists() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数名字感觉把注释要写的内容也包含进去了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哈哈哈哈,可以商讨


/**
* @author Jinrui Zhang
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some java doc for this class?
By the way, I think it's not easy to understand these class without documents

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方确实需要文档,我后面补一下比较详细的

public class ReadOnlyTsFile {

private ITsRandomAccessFileReader randomAccessFileReader;
private MetadataQuerier metadataQuerier;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two variables could be converted to local variables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querier can be renamed to reader

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Beyyes 啥意思?
@qiaojialin Comment Reject.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

该问题忽略

@Beyyes
Copy link
Member

Beyyes commented Feb 1, 2018

其实还有个较大的问题,你现在QueryDataSet返回的内容RowRecord,应该没有考虑空值的情况?在IoTDB的group by语句里,其实是有空值的情况的,感觉可以在QueryDataSet里保留一下这个处理逻辑。~!

这个暂时先不处理,考虑以后构造一个更加抽象的能够容纳各种查询结果的QueryDataSet,以用在IoTDB中

nextSeriesChunkIndex = 0;
}

public SeriesReaderFromSingleFileByTimestampImpl(ITsRandomAccessFileReader randomAccessFileReader, Path path) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this method is not used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IoTDB里面用的...

* Created by zhangjinrui on 2017/12/25.
*/
public class EncodedSeriesChunkDescriptor implements SeriesChunkDescriptor {
private String filePath;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the other aggregation values also needed? such as max value, min value, sum, mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有道理,这个后面根据情况加一下

/**
* Created by zhangjinrui on 2017/12/25.
*/
public class MemSeriesChunk implements SeriesChunk{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming of this class is not appropriate? Memroy Serius Chunk? What's the meaning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeriesChunk本来是指在TsFile(磁盘上的文件)中的概念,这里表示被加载到内存中的SeriesChunk

*/
public class PageReader implements TimeValuePairReader {

private TSDataType dataType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PageReader每次都读一个点,从磁盘IO的角度(读多列,每列都会是随机io)来考虑,是不是效率会比一次把page都读出来效率低一些?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每次只解析一个点,但是Page的所有字节是一次性从磁盘上加载进来的

@MyXOF
Copy link
Contributor

MyXOF commented Feb 5, 2018

包名里面有V2的这次需要去掉吗

private void resetTimeStamp() {
if (totalValueCount == 0)
minTimestamp = -1;
}

@Override
public void writeToFileWriter(TsFileIOWriter writer, Statistics<?> statistics) throws IOException {
if(minTimestamp==-1){
LOG.error("Write page error, {}, minTime:{}, maxTime:{}",desc,minTimestamp,maxTimestamp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果出错了,还继续执行下去?

Copy link
Contributor

@MyXOF MyXOF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后几处意见

@@ -44,6 +44,9 @@ public void writePage(ListByteArrayOutputStream listByteArray, int valueCount, S
// compress the input data
if (this.minTimestamp == -1)
this.minTimestamp = minTimestamp;
if(this.minTimestamp==-1){
LOG.error("Write page error, {}, minTime:{}, maxTime:{}",desc,minTimestamp,maxTimestamp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果出错了,还继续执行下去?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要,这个是当时测试出来的错误,如果出问题想用error日志记录下来。


public void startRowGroup(String deltaObjectId) {
LOG.debug("start row group:{}", deltaObjectId);
currentRowGroupMetaData = new RowGroupMetaData(deltaObjectId, 0, 0, new ArrayList<>(), "");// FIXME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// FIXME
是什么意思?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我也不知道以前就有,删除了

return new TsInt(valueDecoder.readInt(valueInputStream));
case INT96:
case FIXED_LEN_BYTE_ARRAY:
case BIGDECIMAL:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这边加一个default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jt2594838
Copy link
Contributor

没有别的意见。

@MyXOF MyXOF merged commit ed139d6 into master Mar 7, 2018
@Beyyes Beyyes deleted the f_zjr_read_rebuild_pr branch May 16, 2018 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tsfile写入时TimeSeriesChunkMetaData中的startTime会出现-1的情况
7 participants