Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[writing] 当采用分批上报时,进度报告中 xxxRow 字段数据不正确 #32

Open
ShiyiWei1995 opened this issue Aug 4, 2023 · 2 comments
Assignees

Comments

@ShiyiWei1995
Copy link

场景复现

  • 启动 bench ,将 generator-batch-size 设为 2。具体命令如下:
/opt/ymatrix/matrixdb5/bin/mxbench run --generator-empty-value-ratio 10 --generator-randomness M --workspace "/tmp/mxbench" --metrics-type int8 --storage-type mars3 --schema-name writing --tag-num 100 --writer-progress-include-table-size true --generator telematics --db-master-host ip-10-0-48-24.cn-northwest-1.compute.internal --table-name writing_20230804113824 --ts-start "2023-08-04 03:38:24" --writer-mxgate-path /opt/ymatrix/matrixdb5/bin/mxgated --ts-end "2023-08-04 11:38:24" --generator-batch-size 2 --skip-set-gucs true --writer-progress-format json --writer-progress-with-timezone true --db-master-port 5432 --total-metrics-count 100 --generator-disorder-ratio 1 --benchmark nil --writer stdin --db-database postgres --db-user mxadmin --ts-step-in-second 1
  • bench 输出写入进度 json 信息,其中 totalRows 字段的值为 5760000,具体如下:
{"start":"2023-08-04 03:38:33 +0000","end":"2023-08-04 03:38:38 +0000","period":"5.017314889s","currTotalRows":232800,"totalRows":5760000,"currPeriodRows":232800,"currTotalBytes":226396246,"totalBytes":2177280000,"currPeriodBytes":226396246,"writtenMxGateTotal":226396246,"currPeriodWrittenMxGate":226396246,"tableSize":20747838}
截屏2023-08-04 13 56 36

问题描述

totalRows 字段按照之前的定义应该等于 设备数 *(采集结束时间 - 采集开始时间)/ 采集周期,即最终入库行数。
现在 totalRows 的值等于 最终入库行数 * 分批次数,即总批数。
其他 xxxRow 字段有同样的问题。

@jinfudong jinfudong self-assigned this Aug 4, 2023
@jinfudong
Copy link
Contributor

// mxbench/internal/engine/generator/telematics/telematics.go

func (g *Generator) GetPrediction(table *metadata.Table) (engine.GeneratorPrediction, error) {
	linesPerRow := float64(g.cfg.BatchSize)
	linesInTable := float64(g.gcfg.TagNum) * float64(int64(g.gcfg.EndAt.Sub(g.gcfg.StartAt))/
		(int64(g.gcfg.TimestampStepInSecond)*int64(time.Second)))

	amountSize := float64(metadata.ColumnSizeTimestamp+metadata.ColumnSizeVin) + float64(table.SingleRowMetricsSize())
	amountSize = amountSize * linesInTable *
		(1 - float64(g.cfg.emptyValueRatio)/100)

	return engine.GeneratorPrediction{
		Count: int64(linesPerRow * linesInTable),
		Size:  int64(amountSize),
	}, nil
}

上面是mxbench中计算totalRows的地方,是 “最终入库数” * “分批次数”
分批就是把一行数据拆分成多行进行插入,如果不采用这种计算方式,假如batchSize=2, 一行输入A被拆分成了a1、a2, 那么当a1入库了,a2没入库的时候,就无法计算curRows的行数。
@ShiyiWei1995

@ShiyiWei1995
Copy link
Author

ShiyiWei1995 commented Aug 7, 2023

// mxbench/internal/engine/generator/telematics/telematics.go

func (g *Generator) GetPrediction(table *metadata.Table) (engine.GeneratorPrediction, error) {
	linesPerRow := float64(g.cfg.BatchSize)
	linesInTable := float64(g.gcfg.TagNum) * float64(int64(g.gcfg.EndAt.Sub(g.gcfg.StartAt))/
		(int64(g.gcfg.TimestampStepInSecond)*int64(time.Second)))

	amountSize := float64(metadata.ColumnSizeTimestamp+metadata.ColumnSizeVin) + float64(table.SingleRowMetricsSize())
	amountSize = amountSize * linesInTable *
		(1 - float64(g.cfg.emptyValueRatio)/100)

	return engine.GeneratorPrediction{
		Count: int64(linesPerRow * linesInTable),
		Size:  int64(amountSize),
	}, nil
}

上面是mxbench中计算totalRows的地方,是 “最终入库数” * “分批次数” 分批就是把一行数据拆分成多行进行插入,如果不采用这种计算方式,假如batchSize=2, 一行输入A被拆分成了a1、a2, 那么当a1入库了,a2没入库的时候,就无法计算curRows的行数。 @ShiyiWei1995

ok,了解。先暂时将这个问题搁置一下,看看能不能从产品的角度把 UI 中关于这个统计信息的功能调整一下,将行数的概念换成批次数的概念。辛苦💦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants