Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when uploading dataset with pre-annotation results, tempDataId is not unique. #240

Closed
chanyoung1998 opened this issue Apr 19, 2024 · 8 comments
Assignees

Comments

@chanyoung1998
Copy link
Contributor

chanyoung1998 commented Apr 19, 2024

Describe

when i uploaded a dataset with pre-annotation results, the process seems to work normally. However, we found that some data_annotation_object was missing.

And I found that UploadDataUseCase is in charge of uploading the dataset. In particular, tempDataId is assigned to dataAnnotationObjectBO before inserting into DB and id which is generated after database insertion maps to tempDataId.

But in this process, i found some duplicate tempDataId, and it results in mapping inappropriatly id to tempDataId. As a result, Uploading pre-annoation result does not work properly , ommiting some results.

list.forEach(subDataNameList -> parseExecutorService.submit(Objects.requireNonNull(TtlRunnable.get(() -> {
  var dataInfoBOList = new ArrayList<DataInfoBO>();
  var dataAnnotationObjectBOList = new ArrayList<DataAnnotationObjectBO>();
  try {
      subDataNameList.forEach(dataName -> {
          var dataFiles = this.getSingleDataFiles(sceneFile, dataName, errorBuilder, datasetType);
          if (CollectionUtil.isNotEmpty(dataFiles)) {
              log.info("dataStart,dataName:{},dataFiles:{}",dataName,dataFiles.stream().map(File::getName).collect(Collectors.toList()));
              var tempDataId = ByteUtil.bytesToLong(SecureUtil.md5().digest(UUID.randomUUID().toString()));
              var dataAnnotationObjectBO = dataAnnotationObjectBOBuilder.build();
              // assign tempDataId
              dataAnnotationObjectBO.setDataId(tempDataId);
              handleDataResult(sceneFile, dataName, dataAnnotationObjectBO, dataAnnotationObjectBOList, errorBuilder);
              var fileNodeList = this.assembleContent(dataFiles, rootPath, dataInfoUploadBO);
              log.info("Get data content,frameName:{},content:{} ", dataName, JSONUtil.toJsonStr(fileNodeList));
              var dataInfoBO = dataInfoBOBuilder.build();
              dataInfoBO.setName(dataName);
              dataInfoBO.setOrderName(NaturalSortUtil.convert(dataName));
              dataInfoBO.setContent(fileNodeList);
              dataInfoBO.setTempDataId(tempDataId);
              dataInfoBOList.add(dataInfoBO);
          }
      });
      if (CollectionUtil.isNotEmpty(dataInfoBOList)) {
          log.info("dataInfoBOList:{}",dataInfoBOList.stream().map(DataInfoBO::getTempDataId).collect(Collectors.toList()));
          log.info("dataAnnotationObjectBOList:{}",dataAnnotationObjectBOList.stream().map(DataAnnotationObjectBO::getDataId).collect(Collectors.toList()));
          var resDataInfoList = this.insertBatch(dataInfoBOList, datasetId, errorBuilder, sceneId);
          // mapping data id to tempDataId
          this.saveBatchDataResult(resDataInfoList, dataAnnotationObjectBOList);
      }
  } catch (Exception e) {
      log.error("Handle data error", e);
  } finally {
      parsedDataNum.set(parsedDataNum.get() + subDataNameList.size());
      var uploadRecordBO = uploadRecordBOBuilder.parsedDataNum(parsedDataNum.get()).build();
      uploadRecordDAO.updateById(DefaultConverter.convert(uploadRecordBO, UploadRecord.class));
      countDownLatch.countDown();
  }
}))));
try {
  countDownLatch.await();
} catch (InterruptedException e) {
  log.error("Parse point cloud count down latch error", e);
}
});

mapping data id to tempDataId

public void saveBatchDataResult(List<DataInfoBO> dataInfoBOList, List<DataAnnotationObjectBO> dataAnnotationObjectBOList) {

mapping not properly

image

Cause

tempDataId is not generated uniquely in Multi - Thread.

var tempDataId = ByteUtil.bytesToLong(SecureUtil.md5().digest(UUID.randomUUID().toString()));

test code

@RepeatedTest(100)
    public void UUIDMultiThreadNotEnsureUniqueTempDataId() throws InterruptedException {
        //given
        int threadCount = 5;
        int dataInfoCount = 100;
        ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
        CountDownLatch latch = new CountDownLatch(dataInfoCount);
        Set uuidSet = new HashSet();

        //when
        for (int i = 0; i < dataInfoCount; i++) {
            executorService.submit(()->{
                try{
                    uuidSet.add(ByteUtil.bytesToLong(SecureUtil.md5().digest(UUID.randomUUID().toString())));
                }finally {
                    latch.countDown();
                }
            });
        }
        latch.await();

        //then
        Assertions.assertEquals(uuidSet.size(), dataInfoCount);
    }

test result

image

Suggest

Use synchronized blocks to "Make sure to generate tempDataId uniquely"

test code

   @RepeatedTest(100)
    public void UUIDMultiThreadInSynchronousBlock() throws InterruptedException {
        //given
        int threadCount = 5;
        int dataInfoCount = 100;
        ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
        CountDownLatch latch = new CountDownLatch(dataInfoCount);
        Set uuidSet = new HashSet();

        //when
        for (int i = 0; i < dataInfoCount; i++) {
            executorService.submit(()->{
                try{
                    synchronized (this) {
                        uuidSet.add(ByteUtil.bytesToLong(SecureUtil.md5().digest(UUID.randomUUID().toString())));
                    }
                }finally {
                    latch.countDown();
                }
            });
        }
        latch.await();

        //then
        Assertions.assertEquals(uuidSet.size(), dataInfoCount);
    }

test result

image

chanyoung1998 added a commit to chanyoung1998/xtreme1 that referenced this issue Apr 19, 2024
when uploading dataset with pre-annotation results, tempDataId is not generated uniquely.
@jaggerwang jaggerwang assigned fanyinbo and unassigned jaggerwang Apr 22, 2024
@fanyinbo
Copy link
Collaborator

The error in the test code is due to HashSet being thread-unsafe. There is no problem with generating tempDataId itself. Modifying it to Set uuidSet = Collections.synchronizedSet(new HashSet<>()); in the test code will not cause problems. In our code, both dataInfoBOList and dataAnnotationObjectBOList are used in one thread, not across threads.

@chanyoung1998
Copy link
Contributor Author

chanyoung1998 commented Apr 22, 2024

Thank you for your comment! As you mentioned, my test code has a problem. Then what i want to make it clear is that each tempDataId should be assigned to each dataAnnotationObjectBO, not a same tempDataId to more than one dataAnnotationObjectBO, is it right?

@fanyinbo
Copy link
Collaborator

The same tempDataId can be assigned to different dataAnnotationObjectBOs, and one Data can have multiple Objects.

@chanyoung1998
Copy link
Contributor Author

oh,, i got it. i think i did misunderstand. But still i can't understand why there is something big data id in the db screenshot attached at my first comment 36844111888~

@fanyinbo
Copy link
Collaborator

The previous code had a bug. Is the large data ID in your screenshot generated by the latest version or by the previous code.

@chanyoung1998
Copy link
Contributor Author

chanyoung1998 commented Apr 22, 2024

That data id is generated by the previous code. I will check the upstream code, sorry for your wasting..time.

But now, i checked, there is not that difference code, is it right? builder -> set . These changes affect to generate tempDataId?

@fanyinbo
Copy link
Collaborator

Yes

@chanyoung1998
Copy link
Contributor Author

okay i will update my code , and check this issue again. I really thank you for your reviewing this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants