Skip to content

zhang-js25/integrated_extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Installation

安装基础工具

cd unitformat
pip install -e .

cd ../llmchat
pip install -e .

安装每个领域独立工具,如针对固态提取

cd ..
pip install -e .

已有功能

Solid Electrolyte Paper Extractor CLI

{filter,llm,disperse,gather,stack,reduct,paths}

filter              1. Filter pdf/txt files in target Science field. 
批量过滤文献
llm                 2. Extract solid electrolyte data from pdf/txt to csv file. 
从文献中提取数据
disperse            3. Disperse a CSV file into multiple files for manual tagging. 
将单篇文献提取结果拆分为多个表格,作为标注任务
gather              4. Gather multiple CSVs (one paper, corrected) along axis=1. 
将人工矫正后的多个表格任务还原为单篇文章的表格
stack               5. Concatenate multiple CSV files (different papers) to csv file along axis=0. 
将多个文献的表格数据合并为总表格
reduct              6. Reduce and transform units in a CSV file. 
对总表格数据进行单位转换和数据处理等, 满足数据入库标准
paths               Get paths of all files in a directory. 
获取目录下所有符合要求的文件的路径

案例

/per-sample

CLi 入口

per -h

如 per llm ...

例1 修改必要的地址, 然后运行

per llm -c xxx/per-sample/extractor.json

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published