# 1. Creating corpus using the `RIsmed` package
### Wheat
In order not to overload the E-utility servers, NCBI recommends that users post no more than three
URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM
Eastern time during weekdays. Failure to comply with this policy may result in an IP address being
blocked from accessing NCBI.

In [1]:
library(RISmed)
search_query <- EUtilsSummary("Triticum aestivum", type = "esearch", db ="pubmed",  retmax=1000, mindate=1950, maxdate=2020)
summary(search_query)
# retmax=Maximum number of records to retrieve, default is 1000.
# the total result is 32359 (12.03.2020) articles

Query:
("triticum"[MeSH Terms] OR "triticum"[All Fields] OR ("triticum"[All Fields] AND "aestivum"[All Fields]) OR "triticum aestivum"[All Fields]) AND 1950[EDAT] : 2020[EDAT] 

Result count:  32359

In [2]:
# to see the ids of our returned query
head(QueryId(search_query))

In [3]:
# export query into data.frame:
wheat <- EUtilsGet(search_query, type = "efetch", db = "pubmed")
class(wheat)
pubmed_wheat <- data.frame("Title"=ArticleTitle(wheat),"Abstract"=AbstractText(wheat), "PMID"=PMID(wheat))
head(pubmed_wheat)

Title,Abstract,PMID
Development and characterization of &lt;i&gt;Triticum turgidum- Aegilops comosa&lt;/i&gt; and &lt;i&gt;T. turgidum -Ae. markgrafii&lt;/i&gt; amphidiploids.,,32160479
"Economically Optimal Wheat Yield, Protein and Nitrogen Use Component Responses to Varying N Supply and Genotype.",,32158450
Exogenous application of spermine and putrescine mitigate adversities of drought stress in wheat by protecting membranes and chloroplast ultra-structure.,,32158131
The overexpression of high-molecular-weight glutenin subunit Bx7 improves the dough rheological properties by altering secondary and micro-structures of wheat gluten.,,32156364
,,32155734
,,32153608


In [8]:
# get rid of commas, for writing it into csv
pubmed_wheat$Abstract <- as.character(pubmed_wheat$Abstract)
pubmed_wheat$Abstract <- gsub(",", " ", pubmed_wheat$Abstract, fixed = TRUE)
# export csv for documentation
write.csv(pubmed_wheat, file = "RISmed_wheat.csv", row.names = FALSE)

### Barley

In [10]:
search_query2 <- EUtilsSummary("Hordeum vulgare", type = "esearch", db ="pubmed",  retmax=1000, mindate=1950, maxdate=2020)
summary(search_query2)
#the total result is 11542 (12.03.2020) articles

Query:
("hordeum"[MeSH Terms] OR "hordeum"[All Fields] OR ("hordeum"[All Fields] AND "vulgare"[All Fields]) OR "hordeum vulgare"[All Fields]) AND 1950[EDAT] : 2020[EDAT] 

Result count:  11542

In [11]:
# to see the ids of our returned query
head(QueryId(search_query))

In [13]:
# export query into data.frame:
barley <- EUtilsGet(search_query2, type = "efetch", db = "pubmed")
class(barley)
pubmed_barley <- data.frame("Title"=ArticleTitle(barley),"Abstract"=AbstractText(barley), "PMID"=PMID(barley))
head(pubmed_barley)

Title,Abstract,PMID
,,32158138
Exploring microbial dynamics associated with flavours production during highland barley wine fermentation.,"Highland barley wine (HBW) is a well-known grain wine in Qinghai-Tibet Plateau, China and is mainly fermented by local Qu (a traditional starter) with highland barley (Hordeum vulgare, Qingke (Tibetan hulless barley)), and the flavors profiles associated with microbiota succession during HBW fermentation are unrevealed. Hence, high-throughput sequencing (HTS) technology was used to investigate the dynamic changes of microbial community for the duration of the fermentation. In addition, metabolites were analyzed by gas chromatography-mass spectrometry (GC-MS) and high performance liquid chromatography (HPLC). A total of 66 volatile compounds and 7 organic acids were identified during the traditional brewing process. Results showed that the composition of microbiota varied over the fermentation process. The bacterial genera (relative abundance &gt; 0.1%) decreased from 13 at 0 h to 4 encompassing Leuconostoc (13.53%) and Acetobacter (74.60%) after 48 h fermentation, whilst the structure of fungal community was more uniform in comparison with bacteria, as Rhizopus and Saccharomyces were predominant throughout the fermentation. Furthermore, the correlations between microbiota and the detected compounds were also explored, which highlighted that three bacterial genera, including Acetobacter, Leuconostoc, Bacillus and one fungal genus Rhizopus were significantly correlated with main flavours compounds (|r| &gt; 0.7, FDR &lt; 0.01). To conclude, the detailed information provided by this study offer screening strategies of beneficial bacterial and fungal strains to improve the quality of HBW.",32156405
,,32153619
Socio-ecological factors determine crop performance in agricultural systems.,"Agricultural production systems are affected by complex interactions between social and ecological factors, which are often hard to integrate in a common analytical framework. We evaluated differences in crop production among farms by integrating components of several related research disciplines in a single socio-ecological analysis. Specifically, we evaluated spring barley (Hordeum vulgare, L.) performance on 34 farms (organic and conventional) in two agro-ecological zones to unravel the importance of ecological, crop and management factors in the performance of a standard crop. We used Projections to Latent Structures (PLS), a simple but robust analytical tool widely utilized in research disciplines dealing with complex systems (e.g. social sciences and chemometrics), but infrequently in agricultural sciences. We show that barley performance on organic farms was affected by previous management, landscape structure, and soil quality, in contrast to conventional farms where external inputs were the main factors affecting biomass and grain yield. This indicates that more complex management strategies are required in organic than in conventional farming systems. We conclude that the PLS method combining socio-ecological and biophysical factors provides improved understanding of the various interacting factors determining crop performance and can help identify where improvements in the agricultural system are most likely to be effective.",32144284
Effect of N supply on the carbon economy of barley when accounting for plant size.,"Nitrogen availability and ontogeny both affect the relative growth rate (RGR) of plants. In this study of barley (Hordeum vulgare L.) we determined which growth parameters are affected by nitrate (N) availability, and whether these were confounded by differences in plant size, reflecting differences in growth. Plants were hydroponically grown on six different nitrate (N) concentrations for 28 days, and nine harvests were performed to assess the effect of N on growth parameters. Most growth parameters showed similar patterns of responses to N supply whether compared at common time points or common plant sizes. N had a significant effect on the biomass allocation: increasing N increased leaf mass ratio (LMR) and decreased root mass ratio (RMR). Specific leaf area (SLA) was not significantly affected by N. RGR increased with increasing N supply up to 1 mM, associated with increases in both LMR and net assimilation rate (NAR). Increases in N supply above 1 mM did not increase RGR as increases in LMR were offset by decreases in NAR. The high RGR at suboptimal N supply suggest a higher nitrogen use efficiency (biomass/N supply). The reasons for the homeostasis of growth under suboptimal N levels are discussed.",32135075
,,32133024


In [None]:
# get rid of commas, for writing it into csv
pubmed_barley$Abstract <- as.character(pubmed_barley$Abstract)
pubmed_barley$Abstract <- gsub(",", " ", pubmed_barley$Abstract, fixed = TRUE)
# export csv for documentation
write.csv(pubmed_barley, file = "RISmed_barley.csv", row.names = FALSE)

#### Issue:
* many titles/abstracts are nonrecognised and displayed as "NA" -> doesnt provide full result of searches!

#### Advantage
* easy and quick way to get corpus formatted data.frame of most article abstracts with PMIDs etc

# 2. Creating corpus using `EasyPubmed` package 
### search for *Triticum aestivum* only

In [3]:
library(easyPubMed)

In [4]:
my_query <- "Triticum aestivum"
my_entrez_id <- get_pubmed_ids(my_query)
my_abstracts_txt <- fetch_pubmed_data(my_entrez_id, format = "abstract")
head(my_abstracts_txt)

In [6]:
str(my_entrez_id)

List of 9
 $ Count           : chr "32568"
 $ RetMax          : chr "20"
 $ RetStart        : chr "0"
 $ QueryKey        : chr "1"
 $ WebEnv          : chr "NCID_1_1445237_130.14.18.97_9001_1584630119_1787764021_0MetA0_S_MegaStore"
 $ IdList          :List of 20
  ..$ Id: chr "32185038"
  ..$ Id: chr "32184793"
  ..$ Id: chr "32184345"
  ..$ Id: chr "32182810"
  ..$ Id: chr "32181822"
  ..$ Id: chr "32181421"
  ..$ Id: chr "32179229"
  ..$ Id: chr "32174947"
  ..$ Id: chr "32174943"
  ..$ Id: chr "32174929"
  ..$ Id: chr "32173497"
  ..$ Id: chr "32172236"
  ..$ Id: chr "32170056"
  ..$ Id: chr "32168957"
  ..$ Id: chr "32168374"
  ..$ Id: chr "32165865"
  ..$ Id: chr "32160479"
  ..$ Id: chr "32158450"
  ..$ Id: chr "32158131"
  ..$ Id: chr "32156364"
 $ TranslationSet  :List of 2
  ..$ From: chr "Triticum aestivum"
  ..$ To  : chr "\"triticum\"[MeSH Terms] OR \"triticum\"[All Fields] OR (\"triticum\"[All Fields] AND \"aestivum\"[All Fields])"| __truncated__
 $ QueryTranslation: chr "

Here we create a list of `.xml` files to save the output of the query: 

In [5]:
setwd("~/Documents/EasyPM/")

In [6]:
easy.A <- batch_pubmed_download(pubmed_query_string = my_query, 
                               dest_dir = "~/Documents/EasyPM/",
                               format = "xml", 
                               api_key = "532056952c2098c0cd03a43bc25e345a7f08",
                               batch_size = 5000,
                               dest_file_prefix = "easyPM_wheat",
                               res_cn = 1,
                               encoding = "UTF8")

[1] "PubMed data batch 1 / 7 downloaded..."
[1] "PubMed data batch 2 / 7 downloaded..."
[1] "PubMed data batch 3 / 7 downloaded..."
[1] "PubMed data batch 4 / 7 downloaded..."
[1] "PubMed data batch 5 / 7 downloaded..."
[1] "PubMed data batch 6 / 7 downloaded..."
[1] "PubMed data batch 7 / 7 downloaded..."


In [7]:
print(easy.A)

[1] "easyPM_wheat01.txt" "easyPM_wheat02.txt" "easyPM_wheat03.txt"
[4] "easyPM_wheat04.txt" "easyPM_wheat05.txt" "easyPM_wheat06.txt"
[7] "easyPM_wheat07.txt"


In [7]:
#new_data <- easy.A[[1]]
easy_wheat <- table_articles_byAuth(pubmed_data = "~/Documents/EasyPM/easyPM_wheat01.txt",
                                    included_authors = "first",
                                    max_chars = -1,
                                    autofill = TRUE,
                                    dest_file = "wheat_abstracts",
                                    getKeywords = FALSE,
                                    encoding = "ASCII")
head(easy_wheat)

Processing PubMed data .................................................. done!


pmid,doi,title,abstract,year,month,day,jabbrv,journal,keywords,lastname,firstname,address,email
32160479,10.1139/gen-2019-0215,Development and characterization of &lt;i&gt;Triticum turgidum- Aegilops comosa&lt;/i&gt; and &lt;i&gt;T. turgidum -Ae. markgrafii&lt;/i&gt; amphidiploids.,"&lt;i&gt;Aegilops comosa &lt;/i&gt;and &lt;i&gt;Ae. markgrafii &lt;/i&gt;are diploid progenitors of polyploidy Aegilops species sharing M and C genomes, respectively. Transferring valuable genes/traits from Aegilops into wheat is an alternative strategy for wheat genetic improvement. The amphidiploids between diploid Aegilops species and tetraploid wheat can act as bridges to overcome obstacles from direct hybridization and can be developed by the union of unreduced gametes. In this study, we developed seven &lt;i&gt;T. turgidum-Ae. comosa&lt;/i&gt; and two &lt;i&gt;T. turgidum- Ae. markgrafii&lt;/i&gt; amphidiploids. The unreduced gametes mechanisms, including first-division restitution (FDR) and single-division meiosis (SDM), were observed in triploid F&lt;sub&gt;1&lt;/sub&gt; hybrids of &lt;i&gt;T. turgidum-Ae. comosa&lt;/i&gt; (STM) and &lt;i&gt;T. turgidum - Ae. markgrafii&lt;/i&gt; (STC). Only FDR was observed in STC hybrids, whereas FDR or both FDR and SDM were detected in the STM hybrids. All seven pairs of M chromosomes of&lt;i&gt; Ae.comosa&lt;/i&gt; and C chromosomes of &lt;i&gt;Ae. markgrafii &lt;/i&gt;were distinguished by fluorescent in situ hybridization (FISH) probes pSc119.2 and pTa71 combinations with pTa-535 and (CTT)&lt;sub&gt;12&lt;/sub&gt;/(ACT)&lt;sub&gt;7&lt;/sub&gt;, respectively. Meanwhile, the chromosomes of tetraploid wheat and diploid Aegilops parents were distinguished by the same FISH probes. The amphidiploids possessed specific valuable traits such as multiple tillers, large seed size-related traits, and stripe rust resistance that could be utilized in the genetic improvement of wheat.",2020,3,11,Genome,Genome,,Zuo,Yuanyuan,"Sichuan Agricultural University, 12529, Triticeae Research Institute, Wenjiang, Chengdu city, Sichuan, China",
32158450,10.3389/fpls.2019.01790,"Economically Optimal Wheat Yield, Protein and Nitrogen Use Component Responses to Varying N Supply and Genotype.","Improvements in market value of hard red spring wheat (HRS, <i>Triticum aestivum L</i>.) are linked to breeding efforts to increase grain protein concentration (GPC). Numerous studies have been conducted on the identification, isolation of a chromosome region (<i>Gpc-B1</i>) of Wild emmer wheat (<i>Triticum turgidum</i> spp. <i>dicoccoides</i>) and its introgression into commercial hard wheat to GPC. Yet there has been limited research published on the comparative responsiveness of these altered lines and their parents to varied N supply. There is increased awareness that wheat genetic improvements must be assessed over a range of environmental and agronomic management conditions to assess stability. We report herein on economically optimal yield, protein and nitrogen use efficiency (NUE) component responses of two Pacific Northwestern USA cultivars, Tara and Scarlet compared to backcrossed derived near isolines with or without the <i>Gpc-B1</i> allele. A field experiment with 5 N rates as whole plots and 8 genotypes as subplots was conducted over two years under semi-arid, dryland conditions. One goal was to evaluate the efficacy of the <i>Gpc-B1</i> allele under a range of low to high N supply. Across all genotypes, grain yield responses to N supply followed the classic Mitscherlich response model, whereas GPC followed inverse quadratic or linear responses. The <i>Gpc-B1</i> introgression had no major impact on grain protein, but grain N and total above ground crop N yields demonstrated quadratic responses to total N supply. Generally, higher maximum grain yields and steeper rise to the maxima (Mitscherlich c values) were obtained in the first site-year. Tara required less N supply to achieve GPC goals than Scarlet in both site-years. Genotypes with <i>Gpc-B1</i> produced comparable or slightly lower Mitscherlich A values than unmodified genotypes, but displayed similar Mitscherlich c values. Target GPC goals were not achieved at economic optimal yields based on set wheat pricing. Economic optimization of N inputs to achieve protein goals showed positive revenue from additional N inputs for most genotypes. While N uptake efficiency did not drop below 0.40, N fertilizer-induced increases in grain N harvest correlated well with unused post-harvest soil N that is potentially susceptible to environmental loss.",2020,3,11,Front Plant Sci,Frontiers in plant science,,Pan,William L,"Nutrient Cycling, Rhizosphere Ecology Laboratory, Department of Crop and Soil Sciences, Washington State University, Pullman, WA, United States",
32158131,10.1007/s12298-019-00744-7,Exogenous application of spermine and putrescine mitigate adversities of drought stress in wheat by protecting membranes and chloroplast ultra-structure.,"Polyamines (PAs) are positively charged molecules known to mitigate drought stress; however, little is known about their mechanism of alleviating drought stress. We investigated the effects of PAs exogenously applied as a seed primer and as a foliar spray on the growth, membrane stability (MS), electrolyte leakage (EL), Na<sup>+</sup> and K<sup>+</sup> cations, reactive oxygen species (ROS), catalase (CAT; EC 1.11.1.6) and guaiacol peroxidase (GPX; EC 1.11.1.7) activity and chloroplast ultra-structure in wheat (<i>Triticum aestivum</i> L.; cv. Sakha-94) under drought stress. Three PA solutions, namely, putrescine, spermine and a mixture of the two (Mix), were each applied at a concentration of 100....M. Our study demonstrated that the retardation of chlorophyll loss and elevation of Rubisco levels were involved in PA-enhanced growth under drought stress. These relationships were mainly reflected in elevated fresh weight and dry weight in response to foliar spraying with all PA solutions and seed priming with the Mix solution. The elevated growth seemed to be due to increased photosynthetic pigments, protein and Rubisco. In contrast, drought decreased growth, photosynthetic pigments, protein and Rubisco. MS was enhanced by PAs applied as a seed primer or foliar spray, as shown by clear reductions in EL %, malondialdehyde (MDA) content and the Na<sup>+</sup>/K<sup>+</sup> ratio as well as reduced ROS markers and elevated CAT (but not GPX) activity. Further study showed that the Mix solution of PAs, applied either during seed priming or as a foliar spray, improved chloroplast ultra-structure, suggesting that improvements in Rubisco and photosynthetic pigments were involved in PA maintenance of chloroplast stability. Therefore, the present study showed that elevated CAT activity is the main mechanism through which PAs reduce ROS and MDA, thereby improving MS and protecting mesophyll cells structurally and functionally under drought stress in wheat.",2020,3,11,Physiol Mol Biol Plants,Physiology and molecular biology of plants : an international journal of functional plant biology,,Hassan,Nemat,"Botany and Microbiology Department, Faculty of Science, Damietta University, Damietta, Egypt",
32156364,10.1016/j.foodres.2019.108914,The overexpression of high-molecular-weight glutenin subunit Bx7 improves the dough rheological properties by altering secondary and micro-structures of wheat gluten.,"Bread wheat (Triticum aestivum L.) is one of the crucial cereals consumed by human beings and wheat gluten, the natural macromolecules, mainly determines the processing quality of wheat dough. The high-molecular-weight glutenin subunits (HMW-GSs) of gluten proteins are recognized as one of the main components regulating the rheological properties of dough. The overexpressed Bx7 subunit (Bx7<sup>OE</sup>) has been reported to improve wheat quality and rheological properties of dough, however its effect on secondary and micro- structures of gluten is still unclear. In this study, we evaluated the composition of main storage proteins in wheat grains of two near-isogenic lines and studied the effect of Bx7 subunit expression level on the secondary structures of gluten and micro-structure of gluten during dough mixing process. Results showed the protein content, HMW-GSs proportion in total glutenins and free sulfhydryl content increased in the flour of HMW-Bx7<sup>OE</sup> wheat line, and the accumulation of unextractable polymeric protein during grain filling stage accelerated. It was found that the content of ..-sheets in secondary structures of gluten increased and a more compact micro-structure of gluten network formed in the dough. Protein network analysis characterized and quantified the alterations in the gluten micro-structure. In the process of dough mixing, protein area, total protein length, number of junctions and branching rate reach the peak at dough development time, which was consistent with Chopin mixing profile. Interestingly, during dough mixing, the above-mentioned parameters of HMW-Bx7<sup>OE</sup> showed less changes than those of HMW-Bx7 wheat line, indicating Bx7<sup>OE</sup> improved the dough stability during mixing. To conclude, Bx7<sup>OE</sup> alters the secondary and micro- structures of gluten and thus improves the mixing and rheological properties of wheat dough.",2020,3,11,Food Res. Int.,"Food research international (Ottawa, Ont.)",,Li,Shaopeng,"State Key Laboratory of Crop Stress Biology in Arid Areas and College of Agronomy, Northwest A&amp",
32155734,10.3390/ijms21051812,Global Characterization of GH10 Family Xylanase Genes in <i>Rhizoctonia cerealis</i> and Functional Analysis of Xylanase RcXYN1 During Fungus Infection in Wheat.,"Wheat (<i>Triticum aestivum L</i>.) is an important staple crop. <i>Rhizoctonia cerealis</i> is the causal agent of diseases that are devastating to cereal crops, including wheat. Xylanases play an important role in pathogenic infection, but little is known about xylanases in <i>R. cerealis</i>. Herein, we identified nine xylanase-encoding genes from the <i>R. cerealis</i> genome, named <i>RcXYN1-RcXYN9</i>, examined their expression patterns, and investigated the pathogenicity role of RcXYN1. RcXYN1-RcXYN9 proteins contain two conserved glutamate residues within the active motif in the glycoside hydrolase 10 (GH10) domain. Of them, RcXYN1-RcXYN4 are predicted to be secreted proteins. <i>RcXYN1-RcXYN9</i> displayed different expression patterns during the infection process of wheat, and <i>RcXYN1</i>, <i>RcXYN2</i>, <i>RcXYN5</i>, and <i>RcXYN9</i> were expressed highly across all the tested inoculation points. Functional dissection indicated that the RcXYN1 protein was able to induce necrosis/cell-death and H<sub>2</sub>O<sub>2</sub> generation when infiltrated into wheat and <i>Nicotiana benthamiana</i> leaves. Furthermore, application of RcXYN1 protein followed by <i>R. cerealis</i> led to significantly higher levels of the disease in wheat leaves than application of the fungus alone. These results demonstrate that RcXYN1 acts as a pathogenicity factor during <i>R. cerealis</i> infection in wheat. This is the first investigation of xylanase genes in <i>R. cerealis</i>, providing novel insights into the pathogenesis mechanisms of <i>R. cerealis</i>.",2020,3,11,Int J Mol Sci,International journal of molecular sciences,,Lu,Lin,"Institute of Crop Sciences, National Key Facility for Crop Gene Resources and Genetic Improvement, Chinese Academy of Agricultural Sciences, Beijing 100081, China",
32153608,10.3389/fpls.2020.00097,Single Nucleotide Mutagenesis of the <i>TaCHLI</i> Gene Suppressed Chlorophyll and Fatty Acid Biosynthesis in Common Wheat Seedlings.,"Wheat (<i>Triticum aestivum</i> L.) is one of the most important crops in the world. Chlorophyll plays a vital role in plant development and crop improvement and further determines the crop productivity to a certain extent. The biosynthesis of chlorophyll remains a complex metabolic process, and fundamental biochemical discoveries have resulted from studies of plant mutants with altered leaf color. In this study, we identified a chlorophyll-deficiency mutant, referred to as <i>chli</i>, from the wheat cultivar Shaannong33 that exhibited an obvious pale-green leaf phenotype at the seedling stage, with significantly decreased accumulation of chlorophyll and its precursors, protoporphyrin IX and Mg-protoporphyrin IX. Interestingly, a higher protoporphyrin IX to Mg-protoporphyrin IX ratio was observed in <i>chli</i>. Lipid biosynthesis in <i>chli</i> leaves and seeds was also affected, with the mutant displaying significantly reduced total lipid content relative to Shaanong33. Genetic analysis indicated that the pale-green leaf phenotype was controlled by a single pair of recessive nuclear genes. Furthermore, sequence alignment revealed a single-nucleotide mutation (G664A) in the gene TraesCS7A01G480700.1, which encodes subunit I of the Mg-chelatase in plants. This single-nucleotide mutation resulted in an amino acid substitution (D221N) in the highly conserved domain of subunit I. As a result, mutant protein Tachli-7A lost the ability to interact with the normal protein TaCHLI-7A, as assessed by yeast two-hybrid assay. Meanwhile, <i>Tachli-7A</i> could not recover the chlorophyll deficiency phenotype of the <i>Arabidopsis thaliana</i> SALK_050029 mutant. Furthermore, we found that in Shaannong33, the protoporphyrin IX to Mg-protoporphyrin IX ratio was growth state-dependent and insensitive to environmental change. Overall, the mutation in Tachli-7A impaired the function of Mg-chelatase and blocked the conversion of protoporphyrin IX to Mg- protoporphyrin IX. Based on our results, the <i>chli</i> mutant represents a potentially useful resource for better understanding chlorophyll and lipid biosynthetic pathways in common wheat.",2020,3,10,Front Plant Sci,Frontiers in plant science,,Wang,Chaojie,"College of Agronomy, Northwest A&amp",


## 2.2 Search for full *Triticeae*
Here I compare search versions when only call for *Triticeae* or *single mention of all bigger genus* to get the PMIDs

In [8]:
triticeae <- '"Triticeae" AND "Agropyron" AND "Anthosachne" AND "Elymus" AND "Hordeum" AND "Kengyilia" AND "Leymus" AND "Aegilops" AND "Thinopyrum" AND "Triticum"'
triticeae_id <- get_pubmed_ids(triticeae)

In [9]:
str(triticeae_id)

List of 9
 $ Count           : chr "0"
 $ RetMax          : chr "0"
 $ RetStart        : chr "0"
 $ QueryKey        : chr "1"
 $ WebEnv          : chr "NCID_1_1643406_130.14.22.76_9001_1584633297_1007523029_0MetA0_S_MegaStore"
 $ QueryTranslation: chr "\"Triticeae\"[All Fields] AND \"Agropyron\"[All Fields] AND \"Anthosachne\"[All Fields] AND \"Elymus\"[All Fiel"| __truncated__
 $ IdList          : Named list()
 $ TranslationSet  : list()
 $ OriginalQuery   : chr "\"Triticeae\"+AND+\"Agropyron\"+AND+\"Anthosachne\"+AND+\"Elymus\"+AND+\"Hordeum\"+AND+\"Kengyilia\"+AND+\"Leym"| __truncated__


In [None]:
my_abstracts_txt <- fetch_pubmed_data(my_entrez_id, format = "abstract")
head(my_abstracts_txt)

# 3. Creating corpus by downloading different filetypes from PubMed search on website - 0.2
### Medline
After search, choose to sent to file, format: Medline.

# 4. Creating corpus using the `R-Entrez` package - 0.2
### Wheat

In [None]:
## library(rentrez)
## wheat_table <- read.table("~/Documents/marker_genes/text_mining//pubmed_example", header = TRUE, sep = ",")