## GPT Article-Filter Version PubMed_scraper_GPT

In this notebook, we utilize several libraries including `langchain`, `nltk`, `openai`, `pymed`, `Bio` among others, to filter articles based on specific criteria. 

The environment variable 'OPENAI_API_KEY' is set and the base API for `openai` is updated to "https://fmops.ai/api/v1/proxy/openai/v1".

Below are the libraries used:

1. `langchain` : Used for creating conversational AI models.

2. `nltk` : Natural Language Toolkit, used for working with human language data.

3. `openai` : Used to access the OpenAI API for generating human-like text.

4. `os` : The OS module in Python provides functions for interacting with the operating system.

5. `pymed` : Python wrapper for the PubMed Open Access database.

6. `pandas` : A data manipulation and analysis library.

7. `re` : Python's built-in module to work with Regular Expressions.

8. `time` : This module provides various time-related functions.

9. `requests` : Used for making HTTP requests in Python.

10. `Bio` : Biopython is a set of freely available tools for biological computation.

11. `docx` : Python library for creating and updating Microsoft Word (.docx) files.

12. `spacy` : Library for advanced Natural Language Processing.

13. `wordcloud` : A word cloud (or tag cloud) is a visual representation of text data.

14. `docx.shared` : Allows sharing of certain common functions, classes and submodules.


In [11]:
#GPT Article-Filter Version
from langchain import OpenAI, ConversationChain, LLMChain, PromptTemplate
from langchain.memory import ConversationBufferWindowMemory
from langchain.chat_models import ChatOpenAI
import nltk
from nltk import tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import openai
import os
from pymed import PubMed
import pandas as pd
import re
import time
import requests
from Bio import Entrez
from docx import Document
import spacy
from wordcloud import WordCloud
from docx.shared import Inches

os.environ['OPENAI_API_KEY'] = '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'
openai.api_base = "https://fmops.ai/api/v1/proxy/openai/v1" 

nltk.download('punkt')
nltk.download('vader_lexicon')

## Creating Templates for Context and Gene

In this part of the code, we create templates that will be used to provide a structured form of interaction with the AI model. The templates are designed in such a way that they define how a conversation or question should be structured.

The **first template** is `template_Context`, it is used to ask the AI to explain what a certain gene is based on the provided text. The text is passed as the `abstract` variable and the gene is passed as the `gene` variable. 

The **second template** `template_Gene1` is used to ask the AI if a certain gene is used in the context of a full name gene or as a transcription factor based on a provided sentence.

Each template is attached to a `PromptTemplate` which is then used by an `LLMChain` object to create a language model. The `LLMChain` uses `ChatOpenAI` model with a temperature of 0, which means the output will be deterministic and less random.

In [3]:
template_Context = """<Question: Explain in detail what {gene} is in the Text Provided?><Text: {abstract}>
Your Answer(Do not use abbreviation):"""

prompt_Context = PromptTemplate(
    input_variables=["abstract", "gene"], 
    template=template_Context,
)

keyword_Context = LLMChain(
    llm=ChatOpenAI(temperature=0), 
    prompt=prompt_Context
)

#-----------------------------
template_Gene1 = """
<Is {gene} in the provided Text used in the context {fullName} gene or transcription factor? 
Say no if Text says not mentioned or does not appear)]>
<Text: {sentence}>
Your Answer(Yes or No only):"""

prompt_Gene1 = PromptTemplate(
    input_variables=["sentence", "gene", "fullName"], 
    template=template_Gene1,
)

is_Gene = LLMChain(
    llm=ChatOpenAI(temperature=0), 
    prompt=prompt_Gene1
)



## Defining Supporting Functions

A set of supporting functions are defined in this portion of the code. They serve various purposes, including search term generation, gene name fetching, information extraction, text cleaning, abstract fetching, and word cloud generation. These functions are critical in processing and transforming the data for further use.

1. **gene_to_search**: This function generates search terms for a given gene and its full name. The search terms are constructed for PubMed search with a focus on Autism but excludes references to Cancer and Tumor.

2. **gene_fullName**: This function retrieves the full name of a gene given its abbreviation by making a request to the 'mygene.info' API.

3. **extract_geneInfo**: This function extracts the gene information from a given query string.

4. **remove_html_tags**: This function removes HTML tags from a given text string.

5. **fetch_abstract**: This function fetches the abstract of a paper from PubMed given its PMID. 

6. **generate_wordcloud**: This function generates a word cloud given an input text. The word cloud is based on gene names, entities of certain types, and certain patterns in the text.


In [33]:
def gene_to_search(element, fullName):
    ls = []
    ls.append("(" + element + "[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract])) \
    NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])")

    ls.append("(" + fullName + "[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract])) \
    NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])")
        
    return ls

def gene_fullName(gene_abbr):
    url = f'https://mygene.info/v3/query?q=symbol:{gene_abbr}&fields=name'
    time.sleep(0.1)
    response = requests.get(url)
    data = response.json()

    return data['hits'][0]['name']

def extract_geneInfo(query):
    split_query = query.split('(')
    gene_info = split_query[1].split('[Title/Abstract]')
    return gene_info[0]

def remove_html_tags(text):
    clean = re.compile('<.*?>')
    return re.sub(clean, '', text)

def fetch_abstract(pmid):
    handle = Entrez.efetch(db="pubmed", id=pmid, rettype="xml")
    records = Entrez.read(handle)
    try:
        abstract_sections = records["PubmedArticle"][0]["MedlineCitation"]["Article"]["Abstract"]["AbstractText"]
        abstract = "\n".join(str(section) for section in abstract_sections)
    except KeyError:
        abstract = "No abstract available"
    return abstract

def generate_wordcloud(input_text, nlp1, nlp2, nlp3):
    doc1 = nlp1(input_text)
    doc2 = nlp2(input_text)
    doc3 = nlp3(input_text)

    gene_pattern = r"^[A-Z]{1}[A-Za-z0-9_-]*[A-Za-z]{1}[A-Za-z0-9_-]*$"
    genes = [token.text for token in doc2 if re.match(gene_pattern, token.text) and token.pos_ == 'NOUN']
    
    entities1 = ['_'.join(ent.text.split()) for ent in doc1.ents if ent.label_ in 
                 {'ORGAN', 'CELL', 'DEVELOPING_ANATOMICAL_STRUCTURE', 'PATHOLOGICAL_FORMATION'}]
    entities2 = ['_'.join(ent.text.split()) for ent in doc2.ents if ent.label_ in 
                 {'DISEASE'}]
    entities3 = [ent.text for ent in doc2.ents if ent.label_ in 
                 {'TAXON'}]
    combined_entities = genes + entities1 + entities2 + entities3
    
    stem_cell_pattern = re.compile(r'\bstem cell\b', re.IGNORECASE)
    ipsc_pattern = re.compile(r'\bipsc\b', re.IGNORECASE)
    if re.search(stem_cell_pattern, input_text):
        combined_entities.append('stem_cell')
    if re.search(ipsc_pattern, input_text):
        combined_entities.append('iPSC')

    filtered_entities = []
    for entity in combined_entities:
        if not any([entity in other_entity and entity != other_entity for other_entity in combined_entities]):
            filtered_entities.append(entity)

    filtered_text = ' '.join(filtered_entities)

    if not filtered_text:
        img = Image.new('RGB', (400, 200), color='white')
        return img

    wordcloud = WordCloud(background_color='white', max_words=100, contour_width=3, contour_color='steelblue')
    wordcloud.generate(filtered_text)

    return wordcloud


## Defining Main Functions

These main functions serve to search queries, analyze articles, and process extracted sentences. 

1. **search_Query_GPT**: This function uses a query to search PubMed for articles, fetches the articles' abstracts and other relevant information, applies filters, and stores the results in a DataFrame.

2. **article_Interest**: This function analyzes the abstracts, predicts the context and gene information, and calculates an interest score using sentiment analysis.

3. **extract_Sentences**: This function extracts sentences from the text containing a target keyword or an abbreviation in parentheses, and then processes them to remove unwanted details and redundancies.


In [56]:
def search_Query_GPT(query, gene, fullName, keyword_Context_GPT, is_Gene_GPT):
    Entrez.email = "choyoungb@gmail.com"  
    time.sleep(0.5)
    handle = Entrez.esearch(db="pubmed", term=query, retmax=5000)
    record = Entrez.read(handle)
    pmid_list = record["IdList"]
    article_list = []
    element = extract_geneInfo(query)

    for pmid in pmid_list:
        try:
            time.sleep(0.5)
            handle = Entrez.efetch(db="pubmed", id=pmid, rettype="xml")
            time.sleep(1)
            records = Entrez.read(handle)

            try:
                title = records["PubmedArticle"][0]["MedlineCitation"]["Article"]["ArticleTitle"]
                title = remove_html_tags(title)
                pub_date = records["PubmedArticle"][0]["MedlineCitation"]["Article"]["Journal"]["JournalIssue"]["PubDate"]
                article_ids = records["PubmedArticle"][0]["PubmedData"]["ArticleIdList"]
                doi_url = "NA"
                # The PubDate field can be a dictionary with 'Year', 'Month', and 'Day' keys, or just a 'Year' key
                for article_id in article_ids:
                    if article_id.attributes["IdType"] == "doi":
                        doi_url = "https://doi.org/" + article_id
                if 'Year' in pub_date:
                        year = pub_date['Year']
                else:
                    year = None
                url = f"https://pubmed.ncbi.nlm.nih.gov/{pmid}"
                full_abstract = fetch_abstract(pmid)
                full_abstract = remove_html_tags(full_abstract) if full_abstract else ''
                title_and_abstract = title + full_abstract

                #check if any lower cased homonyms are detected. 
                if element.isupper() and not re.search(r'\d', element):
                    if gene not in title_and_abstract:
                        continue

                    if fullName.lower() in title_and_abstract.lower() or (len(gene) >= 4 or re.search(r'\d', gene)):
                        pass
                    else:
                        print("------------------------------------------")
                        print(url)
                        temp = extract_Sentences(title_and_abstract, gene)
                        time.sleep(1)
                        score = article_Interest(gene, temp, fullName, keyword_Context_GPT,is_Gene_GPT)
                        print('▶ ' + str(score))
                        # Check if score is 1
                        if score <= 0:
                            continue

                article_dict = {'info': "Url: " + url + "\n" + "DOI: " + doi_url + "\n\n" + "Title(" + year + "): "
                                + title + "\n\n" + full_abstract + "\n\n"}
                article_list.append(article_dict)
                

            except IndexError:
                print(f"|Detected Excerpt, not an abstract, or no doi, with PMID:{pmid}|")
                continue

        except Exception as e:
            print(f"|Detected Excerpt, not an abstract, or no doi, with PMID:{pmid}|")
            print(e)

    search_df = pd.DataFrame(article_list)
    return search_df

def article_Interest(gene, full_Abstract, fullName, keyword_Context_GPT, is_Gene_GPT):
    time.sleep(0.3)
    AI_Context = keyword_Context_GPT.predict(abstract=full_Abstract, gene=gene)
    time.sleep(0.3)
    AI_Gene = is_Gene_GPT.predict(sentence=AI_Context, gene=gene, fullName=fullName)

    print(AI_Context)
    print('▶ ' + AI_Gene)
    sid = SentimentIntensityAnalyzer()
    article_Score = sid.polarity_scores(AI_Gene)
    interest_Score = article_Score['compound']
    
    return 1 if interest_Score > 0 else 0


def extract_Sentences(text, target_keyword):
    sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text)
    keyword_sentences = [sentence for sentence in sentences if target_keyword in sentence]
    processed_sentences = []
    abbreviation = []

    for sentence in keyword_sentences:
        if sentence.count(',') > 2:
            temp = sentence.split(',')

            keyword_index = -1
            for i, part in enumerate(temp):
                if target_keyword in part:
                    keyword_index = i
                    break

            temp = [temp[0], temp[keyword_index], temp[-1]]

            processed_sentences.append(','.join(temp))
        else:
            processed_sentences.append(sentence)

    # Find sentences containing single-word abbreviations in parentheses
    abbreviation_sentences = [sentence for sentence in sentences if re.search(r'\([A-Za-z]+\)', sentence)]

    # Process abbreviation sentences
    for sentence in abbreviation_sentences:
        if sentence.count(',') > 2:
            temp = sentence.split(',')

            abbreviation_index = -1
            for i, part in enumerate(temp):
                if re.search(r'\([A-Za-z]+\)', part):
                    abbreviation_index = i
                    break

            temp = [temp[0], temp[abbreviation_index], temp[-1]]

            abbreviation.append(','.join(temp))
        else:
            abbreviation.append(sentence)

    combined_text = ' '.join(abbreviation + processed_sentences)
    # Remove words encapsulated in parentheses and any extra whitespace
    cleaned_text = re.sub(r'\s\([A-Za-z]+\)', '', combined_text)
    
    # Remove abbreviations from the cleaned text
    for abbr in abbreviation:
        cleaned_text = cleaned_text.replace(abbr, '')

    return cleaned_text

## Loading Spacy Models

Here, we load three different models from SpaCy for named entity recognition. This step may take some time due to the size of the models.

- `en_ner_bionlp13cg_md`: This model is trained on the BioNLP 13CG corpus and is suitable for recognizing various biomedical named entities.
- `en_ner_bc5cdr_md`: This model is trained on the BC5CDR corpus, which focuses on recognizing chemical and disease named entities.
- `en_ner_craft_md`: This model is trained on the CRAFT corpus, providing good performance for a broader range of biomedical named entities.



In [38]:
#This step takes time
nlp1 = spacy.load("en_ner_bionlp13cg_md")
nlp2 = spacy.load("en_ner_bc5cdr_md")
nlp3 = spacy.load("en_ner_craft_md")

## Setting up PubMed and Loading Gene List

1. We first setup the `PubMed` tool by passing our tool name (`"MyTool"`) and an email address. Replace `"choyoungb@gmail.com"` with the email that you used to register on PubMed.

In [47]:
# Replace email that you use for pubmed login
pubmed = PubMed(tool="MyTool", email="ahmed.u0022@gmail.com")  # change to your email

# Replace tf with your list of gene names
original_file = pd.read_excel('TF2.xlsx')
tf = original_file.iloc[:, 0].tolist()  # list of the first five genes
print(tf)
len(tf)

['AR', 'CAT', 'CTCF', 'KDM3A']


4

## Data Processing and Document Generation

The following steps are performed in this code block:

1. **Initialize DataFrame**: We start by initializing an empty pandas DataFrame with columns 'gene' and 'info'.

2. **Search and Process Genes**: For each gene in our transcription factors (tf) list, we conduct a search using the GPT model. The search results are stored in the DataFrame. If the gene does not exist in the 'gene' column of the DataFrame, we append it. Otherwise, we simply concatenate the new data with the existing DataFrame.

3. **DataFrame Completion**: After processing all the genes and completing our DataFrame, we print a message indicating the completion of the DataFrame.

4. **Document Initiation**: We then initiate a Word Document and add a table to it.

5. **Word Cloud Generation and Insertion**: For each row in our DataFrame, we generate a word cloud from the 'info' column, save it as an image, and insert this image into our Word Document.

6. **Document Saving**: Once we've processed all the rows in the DataFrame and inserted the corresponding word clouds, we save our Word Document and print a message indicating the completion of the document.


In [57]:
df = pd.DataFrame(columns=['gene', 'info'])


for gene in tf:
    fullName = re.sub(r'[^A-Za-z0-9\s]+', ' ', gene_fullName(gene))
    query_list = gene_to_search(gene, fullName)
    
    
    for search_phrase in query_list:
        print(search_phrase)
        search_df = search_Query_GPT(search_phrase, gene, fullName, keyword_Context, is_Gene)
        for index, row in search_df.iterrows():
            existing_row = df[df['info'] == row['info']]

            if not existing_row.empty:
                if gene not in existing_row['gene'].values[0]:
                    df.loc[existing_row.index, 'gene'] += f", {gene}"
            else:
                row['gene'] = gene
                df = pd.concat([df, row.to_frame().T], ignore_index=True)       
                
print("DataFrame Complete")
                
doc = Document()

table = doc.add_table(rows=2 * len(df), cols=1)

row_idx = 0
for index, row in df.iterrows():
    table.cell(row_idx, 0).text = str(row["gene"])
    row_idx += 1
    abstract = row["info"].split("Title: ")[-1].split("\n\n", 1)[1]

    wordcloud = generate_wordcloud(abstract, nlp1, nlp2, nlp3)

    img_path = f"wordcloud_{index}.png"
    if isinstance(wordcloud, WordCloud):
        wordcloud.to_file(img_path)
    else: 
        wordcloud.save(img_path)


    table.cell(row_idx, 0).text = str(row["info"])
    paragraph = table.cell(row_idx, 0).paragraphs[0]
    run = paragraph.add_run()
    run.add_picture(img_path, width=Inches(6))


    os.remove(img_path)

    row_idx += 1

doc.save("output.docx")
print("docx Generated")

(AR[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract]))     NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])
------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/36850787
Augmented Reality (AR) is a technology that overlays digital information, such as images, videos, or 3D models, onto the real world. It enhances the user's perception of reality by adding virtual elements to their environment. AR can be experienced through various devices, such as smartphones, tablets, or specialized AR glasses.

In the context of the provided text, AR is mentioned in relation to the development of interactive environments tailored for the treatment of Autism Spectrum Disorders (ASD). The use of new generation wearable devices enables the creation of immersive applications that combine Virtual Reality (VR) and AR technologies.

AR-based treatment for ASD involves creating virtual scenarios or environments that simulate real-life situations. These

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/35163142
Adenosine receptors (ARs) play a crucial role in the modulation of central nervous system (CNS) activity. Adenosine, a neurotransmitter, interacts with four G-protein coupled receptor subtypes, namely A1, A2A, A2B, and A3. However, this text specifically focuses on the A1 and A2A adenosine receptors.

The A1 adenosine receptor is primarily responsible for inhibitory actions on neurotransmission. When activated by adenosine, it reduces the release of neurotransmitters, such as dopamine, glutamate, and norepinephrine. This inhibition helps regulate neuronal activity and prevents excessive excitability in the CNS.

On the other hand, the A2A adenosine receptor facilitates neurotransmission. Activation of A2A receptors enhances the release of neurotransmitters, particularly dopamine. This facilitation promotes neuronal activity and is involved in various physiological processes, including cognition, motor c

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/34110306
In the provided text, AR refers to the average range of intellectual functioning. It is used as a comparison group in the study to assess the adaptive behavior profiles of intellectually gifted children with Autism Spectrum Disorder (ASD). The study aims to determine if the pattern of declining adaptive functioning observed in children with ASD also applies to intellectually gifted children with ASD, as their higher cognitive abilities may act as a protective factor.

To conduct the study, data from the Simons Simplex Collection were analyzed. The researchers identified 51 participants who had full-scale intelligence scores of 130 or above, which is considered within the intellectually gifted range. This group was labeled as the intellectually gifted range (IGR).

To compare the adaptive behavior profiles of intellectually gifted children with ASD, two additional comparison groups were created. The firs

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/31250215
AR stands for Autism Regression. In the context of the provided text, AR refers to a specific phenotype or subtype of autism spectrum disorder (ASD) characterized by a period of normal development followed by a loss of previously acquired skills or regression in cognitive, social, and/or language abilities.

The study mentioned in the text aimed to evaluate the metabolomic profiles of children with ASD, specifically subclassified into two groups: those with mental regression (AR) and those without regression (ANR). Metabolomics is the study of small molecules or metabolites present in biological samples, such as blood or urine, and it provides insights into the metabolic pathways and processes occurring in an organism.

The researchers included 30 children aged 2-6 years with ASD in their study, with 15 children in each group (AR and ANR). They also compared the metabolomic profiles of these children wi

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/29141583
AR stands for Androgen Receptor. In the provided text, X-chromosome inactivation analysis on the AR gene was performed for all female family members. This analysis involves studying the inactivation of one of the two X chromosomes in females, as they have two copies of the X chromosome. X-chromosome inactivation is a process that occurs in females to ensure equal gene expression between males and females, as males only have one X chromosome.

The AR gene is located on the X chromosome and plays a crucial role in the development and functioning of male sexual characteristics. It codes for the androgen receptor protein, which binds to androgen hormones like testosterone and dihydrotestosterone. This binding activates the receptor and triggers a series of cellular responses that are important for male sexual development and function.

By performing X-chromosome inactivation analysis on the AR gene, researc

Augmented reality (AR) refers to a technology that combines virtual elements with the real world environment. In the context of the provided text, an augmented reality game was used to introduce active video games (AVGs) to children with autism spectrum disorder (ASD). 

AR games utilize a device, such as a smartphone or tablet, to overlay digital content onto the real world. This allows the players to interact with virtual objects or characters that appear as if they are part of their physical surroundings. Unlike virtual reality (VR), which completely immerses the user in a simulated environment, AR enhances the real world by adding virtual elements to it.

In the study mentioned in the text, the researchers used an AR game as a means to introduce AVGs to children with ASD. The purpose was to investigate the effects of active videogame play on socialization in these children. The sessions were observed and coded for communication, positive affect, and aggression.

The results of the 

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/22080249
Autistic regression (AR) refers to a phenomenon observed in some individuals with autism spectrum disorders (ASD) where there is a loss of previously acquired skills or a decline in functioning. This study aimed to investigate the different subtypes of AR and their association with various medical, developmental, and psychiatric factors.

The study included 57 children with ASD who experienced autistic regression. The researchers classified AR into two subtypes: type 1 and type 2. Type 1 AR refers to regression that occurs after a period of normal social and language development. In other words, the child initially develops typically but then experiences a loss of skills. Type 2 AR, on the other hand, involves a worsening of previously reported autistic features. This means that the child's existing autistic symptoms become more severe.

The study found that 56.1% of the children had a history of AR, in

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/12364957
AR, in this context, refers to autistic regression. Autistic regression is a term used to describe a phenomenon where a child who previously showed typical development suddenly loses previously acquired skills and experiences a decline in social, communication, and behavioral abilities. In the provided text, the patient was diagnosed with autism, specifically autistic regression.

Autism, or autism spectrum disorder (ASD), is a neurodevelopmental disorder characterized by difficulties in social interaction, communication challenges, and repetitive patterns of behavior. It is a lifelong condition that typically appears in early childhood.

In the case mentioned, the patient experienced a regression in their autistic symptoms. This means that they had previously acquired certain skills and abilities related to social interaction, communication, and behavior, but then experienced a loss or decline in these

(CAT[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract]))     NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])
------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/37357844
Catalase (CAT) is an enzyme that plays a crucial role in protecting cells from oxidative stress by breaking down hydrogen peroxide into water and oxygen. In the provided text, CAT is mentioned in the context of a study investigating the protective effect of Syringic acid in prenatal valproic acid (VPA)-treated rats. 

Valproic acid is a well-known anti-epileptic drug that has been associated with neuroinflammation and autism spectrum disorder (ASD)-like phenotypes. The study aimed to explore the potential of Syringic acid, a polyphenolic compound with anti-inflammatory and neuromodulator activity, to alleviate the effects of VPA-induced autism.

In the study, a single dose of VPA was administered to pregnant rats on the 12th day of gestation. Syringic acid (SA) 

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/36943606
Computer Adaptive Testing (CAT) is a method of administering tests or assessments using computer technology. It is a dynamic and flexible approach that adapts the difficulty level of the questions based on the individual's responses. 

In the context of the provided text, the researchers modified the Social Responsiveness Scale (SRS) for adaptive administration using CAT. The SRS is a widely used tool for quantifying the autism-related phenotype and is also used in health outcomes research. 

To modify the SRS for adaptive administration, the researchers conducted various analyses. One of the analyses performed was item factor analysis, which helps identify the underlying factors or dimensions of the items in the scale. This analysis helps ensure that the items in the scale are measuring the intended construct accurately.

Another analysis conducted was differential item functioning, which examines whet

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/36553414
In the provided text, CAT refers to conventional autism therapy. This therapy is compared to integrative autism therapy (IAT) in a study conducted on children and adolescents with autism spectrum disorder (ASD). The study aimed to examine the effects of these two therapies on multiple physical and social integration domains in individuals with ASD.

The researchers used a two-way repeated analysis of variance to analyze the intervention-related changes in the four domains (physical and social integration) across three time points: pre-test, post-test, and follow-up test. The significance level for determining these changes was set at p < 0.05.

A convenience sample of 24 children with ASD was recruited for the study. These participants underwent either CAT or IAT for 60 minutes per day, twice a week, for a total of 20 sessions over a period of 10 weeks.

The study found promising evidence that IAT was m

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/35042086
The Camouflaging Autistic Traits Questionnaire (CAT-Q) is a self-report instrument that was developed and validated in English to measure social camouflaging. Social camouflaging refers to the behaviors and strategies that individuals with autism use to mask or hide their autistic traits in social contexts. 

The CAT-Q is designed to assess the extent to which individuals engage in camouflaging behaviors. It consists of a series of questions that ask individuals about their experiences and behaviors in social situations. The questionnaire measures various aspects of camouflaging, such as the frequency and intensity of camouflaging behaviors, the reasons for engaging in camouflaging, and the impact of camouflaging on the individual's well-being.

In this particular study, the researchers aimed to validate the Italian version of the CAT-Q and further test its validity and reliability in a large Italian un

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/31657029
The Pediatric Evaluation of Disability Inventory-Computer Adapted Test (PEDI-CAT) is a tool used to assess the functional abilities of children, specifically in the context of neurodevelopmental disorders. It is utilized to determine eligibility for early intervention services funded under the National Disability Insurance Scheme in Australia.

The PEDI-CAT is designed to evaluate a child's performance in various functional domains, such as self-care, mobility, and social/cognitive skills. It is administered through a computer-adapted format, which means that the difficulty level of the test items is adjusted based on the child's responses. This adaptive feature allows for a more precise assessment of the child's abilities, as it tailors the test to their specific skill level.

In the provided text, a study is mentioned that compares the use of the PEDI-CAT with another assessment tool called the Vinela

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/27801722
Computerized adaptive testing (CAT) is a method of administering assessments or surveys using a computer program that adapts the difficulty level of the questions based on the individual's responses. In the context of the provided text, CAT was used to administer the PROMIS Pediatric Parent-Proxy Peer Relationships Measure to parents of children with autism spectrum disorder (ASD).

CAT works by initially presenting the individual with a question of average difficulty. Based on their response, the computer program determines the next question to be presented. If the individual answers correctly, the program selects a slightly more difficult question, and if they answer incorrectly, it selects a slightly easier question. This process continues until a predetermined level of measurement precision is reached.

In this study, the PROMIS Parent-Proxy Peer Relationships Measure was administered to parents of 

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/25312547
In the provided text, CAT refers to the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test (PEDI-CAT). The PEDI-CAT is a tool used to assess the functional skills and abilities of children and adolescents with autism spectrum disorders (ASDs). It is a computerized adaptive test, meaning that the questions and difficulty level of the test are adjusted based on the individual's responses.

The PEDI-CAT-ASD specifically focuses on assessing daily activities and tasks that individuals with ASDs may encounter in their daily lives. It measures various domains such as self-care, mobility, social function, and responsibility for life tasks. The responsibility for life tasks domain specifically evaluates the individual's ability to manage and take responsibility for tasks related to daily living, such as personal hygiene, household chores, and time management.

In the cross-sectional study menti

------------------------------------------
https://pubmed.ncbi.nlm.nih.gov/21846290
CAT stands for Computer Adaptive Test. In the context of the provided text, CAT refers to the Pediatric Evaluation of Disability Inventory-Computer Adaptive Test (PEDI-CAT). It is a computer-based assessment tool that is designed to measure a child's ability to perform activities necessary for personal self-sufficiency and engagement in the community.

The PEDI-CAT was developed as an alternative measure to assess adaptive behavior in children and youth with autism spectrum disorders. It addresses the limitations of current adaptive behavior measures, such as their length and the need for a professional interviewer.

Unlike traditional assessments, the PEDI-CAT utilizes computer technology to administer the test. It adapts to the individual child's responses by adjusting the difficulty level of the questions based on their previous answers. This adaptive feature allows for a more efficient and accurate 

(catalase[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract]))     NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])
(CTCF[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract]))     NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])
(CCCTC binding factor[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract]))     NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])
(KDM3A[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract]))     NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])
(lysine demethylase 3A[Title/Abstract]) AND ((AUTISM[Title/Abstract]) OR (autistic[Title/Abstract]))     NOT (CANCER[Title/Abstract]) NOT (TUMOR[Title/Abstract])
DataFrame Complete
docx Generated


In [61]:
print(df)

     gene                                               info
0      AR  Url: https://pubmed.ncbi.nlm.nih.gov/36803626\...
1      AR  Url: https://pubmed.ncbi.nlm.nih.gov/35112480\...
2      AR  Url: https://pubmed.ncbi.nlm.nih.gov/35042285\...
3      AR  Url: https://pubmed.ncbi.nlm.nih.gov/34947998\...
4      AR  Url: https://pubmed.ncbi.nlm.nih.gov/33248253\...
..    ...                                                ...
96   CTCF  Url: https://pubmed.ncbi.nlm.nih.gov/33004838\...
97   CTCF  Url: https://pubmed.ncbi.nlm.nih.gov/30377227\...
98   CTCF  Url: https://pubmed.ncbi.nlm.nih.gov/29133437\...
99   CTCF  Url: https://pubmed.ncbi.nlm.nih.gov/22395465\...
100  CTCF  Url: https://pubmed.ncbi.nlm.nih.gov/21725066\...

[101 rows x 2 columns]
