You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Researchers have trained a lot of data sets due to the emergence of Large Language Models (LLMs). However, many data sets that cannot be traced were produced by the unsupervised training of LLMs. Therefore, it is now problematic whether the data sets used match the requirements. We intend to establish a standard for the Artificial Intelligence (AI) training data set in order to facilitate traceability because it has numerous important properties that should be known. We proposed AIBOM, which employs standardised formats to simplify the administration of data in various formats and was inspired by SBOM in software engineering. After discussions between Xihe Platform and lawyers, we agreed that we can help data governance plans and reduce data governance risks by integrating the AIBOM standard.
The specific content of AIBOM and its explanations are shown in the figure. Interested friends are welcome to contribute.
Next we plan to do the following:
Design data format as metadata-Header
Design data sample
We assume that changes in training data and annotation will not affect each other, and design the following example:
2.1 Example of changes to training data and label version updates
2.2 Sample display of data query
Coding based on 1,2 points
The text was updated successfully, but these errors were encountered:
Researchers have trained a lot of data sets due to the emergence of Large Language Models (LLMs). However, many data sets that cannot be traced were produced by the unsupervised training of LLMs. Therefore, it is now problematic whether the data sets used match the requirements. We intend to establish a standard for the Artificial Intelligence (AI) training data set in order to facilitate traceability because it has numerous important properties that should be known. We proposed AIBOM, which employs standardised formats to simplify the administration of data in various formats and was inspired by SBOM in software engineering. After discussions between Xihe Platform and lawyers, we agreed that we can help data governance plans and reduce data governance risks by integrating the AIBOM standard.
The specific content of AIBOM and its explanations are shown in the figure. Interested friends are welcome to contribute.
Next we plan to do the following:
We assume that changes in training data and annotation will not affect each other, and design the following example:
2.1 Example of changes to training data and label version updates
2.2 Sample display of data query
The text was updated successfully, but these errors were encountered: