cd unitformat
pip install -e .
cd ../llmchat
pip install -e .cd ..
pip install -e .Solid Electrolyte Paper Extractor CLI
{filter,llm,disperse,gather,stack,reduct,paths}
filter 1. Filter pdf/txt files in target Science field.
批量过滤文献
llm 2. Extract solid electrolyte data from pdf/txt to csv file.
从文献中提取数据
disperse 3. Disperse a CSV file into multiple files for manual tagging.
将单篇文献提取结果拆分为多个表格,作为标注任务
gather 4. Gather multiple CSVs (one paper, corrected) along axis=1.
将人工矫正后的多个表格任务还原为单篇文章的表格
stack 5. Concatenate multiple CSV files (different papers) to csv file along axis=0.
将多个文献的表格数据合并为总表格
reduct 6. Reduce and transform units in a CSV file.
对总表格数据进行单位转换和数据处理等, 满足数据入库标准
paths Get paths of all files in a directory.
获取目录下所有符合要求的文件的路径
/per-sample
per -h如 per llm ...
例1 修改必要的地址, 然后运行
per llm -c xxx/per-sample/extractor.json