Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design and Implementation of File for AI Data #202

Closed
viviTjwan opened this issue Oct 26, 2023 · 2 comments
Closed

Design and Implementation of File for AI Data #202

viviTjwan opened this issue Oct 26, 2023 · 2 comments
Assignees
Labels
ai enhancement New feature or request

Comments

@viviTjwan
Copy link

Researchers have trained a lot of data sets due to the emergence of Large Language Models (LLMs). However, many data sets that cannot be traced were produced by the unsupervised training of LLMs. Therefore, it is now problematic whether the data sets used match the requirements. We intend to establish a standard for the Artificial Intelligence (AI) training data set in order to facilitate traceability because it has numerous important properties that should be known. We proposed AIBOM, which employs standardised formats to simplify the administration of data in various formats and was inspired by SBOM in software engineering. After discussions between Xihe Platform and lawyers, we agreed that we can help data governance plans and reduce data governance risks by integrating the AIBOM standard.
The specific content of AIBOM and its explanations are shown in the figure. Interested friends are welcome to contribute.
AIBOM

Next we plan to do the following:

  1. Design data format as metadata-Header
  2. Design data sample
    We assume that changes in training data and annotation will not affect each other, and design the following example:
    2.1 Example of changes to training data and label version updates
    2.2 Sample display of data query
  3. Coding based on 1,2 points
@genedna genedna added the enhancement New feature or request label Nov 10, 2023
@genedna
Copy link
Member

genedna commented Nov 19, 2023

@viviTjwan ,

I already removed the mda subcommand from mega by #236; you can add the functions of this issue to the craft which is the git extension of Mega.

@Stephenson131313

This comment was marked as spam.

@genedna genedna closed this as completed May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai enhancement New feature or request
Projects
Status: Done
Development

No branches or pull requests

3 participants