# Ανάκτησης Πληροφορίας - Δημιουργία Μηχανής Αναζήτησης
- Μπηλιώνη Παρασκευή <br> Α.Μ. ice20390286
  
- Πλάγου Αικατερίνη  <br> Α.Μ. ice20390191

# 1. Συλλογή δεδομένων

Εισαγωγή και αρχικοποίηση των απαραίτητων βιβλιοθηκών.

In [1]:
# Import libraries
from bs4 import BeautifulSoup  # Web scraping and parsing 
import requests                # Makes HTTP request
import json                    # Handles JSON data
import nltk                    # Text processing
import string                  # Handles strings
import sys                     # System-specific parameters and functions
from nltk.corpus import stopwords       # Handles stopwords in text processing
from nltk.stem import WordNetLemmatizer # Word lemmatization
from collections import defaultdict     # Creates dictionaries 
import ipywidgets as widgets            # Creates widgets
import numpy as np                      # Does computations
from IPython.display import display, Markdown                # Displays widgets and text
from sklearn.feature_extraction.text import TfidfVectorizer  # TF-IDF vectorization
from sklearn.metrics.pairwise import cosine_similarity       # Calculates vector similarity (used in VSM)
from rank_bm25 import BM25Okapi                              # Rankins documents (used in okapi BM25)
from sklearn.metrics import precision_score, recall_score, f1_score  # Calculates engine evaluation metrics
from IPython.display import display, Markdown               
import pandas as pd   

# Download NLTK datasets
# Tokenizer models
nltk.download('punkt')  
# List of stopwords
nltk.download('stopwords')  
# For lemmatization
nltk.download('wordnet')    

# Initialize stopwords and lemmatizer
stop_words = set(stopwords.words('english'))  # Set of stopwords
lemmatizer = WordNetLemmatizer()              # Lemmatizer for reducing words to base forms


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\vivh\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\vivh\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\vivh\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Επιλογή πηγής από την οποία θα ξεκινήσει η συλλογή εγγράφων. 

In [2]:
# Starting wikipedia link to scrape
wiki_url = 'https://en.wikipedia.org/wiki/Data_analysis'

try:
    # Send request to starting link
    response = requests.get(wiki_url)
    # Raise an exception for errors
    response.raise_for_status()
        
    # Parse HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
        
# Handle any exceptions during the request
except requests.RequestException as e:
    print(f"____Request failed____\n{e}\n")

display(Markdown(f"Data will be scraped from this starting page: [{wiki_url}]({wiki_url})<br><br>"))


Data will be scraped from this starting page: [https://en.wikipedia.org/wiki/Data_analysis](https://en.wikipedia.org/wiki/Data_analysis)<br><br>

Αποθήκευση δεδομένων σε JSON αρχείο.
Σε περίπτωση σφάλματος κατά την διάρκεια της διαδικασίας, εμφανίζεται κατάλληλο μήνυμα λάθους.

In [3]:
def store_things(things, filename):
    try:
        # Open the specified file in write mode
        with open(filename, 'w') as file:
            # Convert data to a JSON string and write it to the file
            json.dump(things, file, indent = 4)

    # If the process fails print error message
    except Exception as e:
        print(f"____Error saving____\n{e}")
        

Απόκτηση εγγράφων/συνδέσμων από την κύρια σελίδα.

In [4]:
def get_links(soup):
    # Base wikipedia url
    https = 'https://en.wikipedia.org'  
    # For storing valid links
    links = []  

    display(Markdown("Links saved: <br>"))
    # Find all anchor tags with 'href' attribute
    for link in soup.find_all('a', href = True):
        # Extract link reference
        url = link.get('href') 

        # Check if the link is a wikipedia aticle and filter out irrelevant links
        if url.startswith('/wiki/') and not any(
            url.startswith(f'/wiki/{keyword}')
            for keyword in ['Wikipedia', 'Help', 'Special', 'Portal', 'Talk', 'Category', 'File', 'Main_Page']):
            # Construct full wikipedia url
            full_url = f"{https}{url}"

            # Skip links that appear if they are already saved
            if full_url not in links:
                links.append(full_url)
                #display(Markdown(f"[{full_url}]({full_url})<br>"))
                print(f"{full_url}\n")
                
        # Collect 70 first valid links
        if len(links) >= 70:
            break 
            
    # Return links
    return links  


# Collect and store links from the main page
links = get_links(soup)
store_things(links, 'wikipedia_collected_urls.json')

# For tracking visited pages
visited_links = set()

# Mark the starting link as visited
visited_links.add(wiki_url)


Links saved: <br>

https://en.wikipedia.org/wiki/Data_analysis

https://en.wikipedia.org/wiki/Statistics

https://en.wikipedia.org/wiki/Data_and_information_visualization

https://en.wikipedia.org/wiki/Exploratory_data_analysis

https://en.wikipedia.org/wiki/Information_design

https://en.wikipedia.org/wiki/Interactive_data_visualization

https://en.wikipedia.org/wiki/Descriptive_statistics

https://en.wikipedia.org/wiki/Statistical_inference

https://en.wikipedia.org/wiki/Statistical_graphics

https://en.wikipedia.org/wiki/Plot_(graphics)

https://en.wikipedia.org/wiki/Infographic

https://en.wikipedia.org/wiki/Data_science

https://en.wikipedia.org/wiki/Tamara_Munzner

https://en.wikipedia.org/wiki/Ben_Shneiderman

https://en.wikipedia.org/wiki/John_Tukey

https://en.wikipedia.org/wiki/Edward_Tufte

https://en.wikipedia.org/wiki/Simon_Wardley

https://en.wikipedia.org/wiki/Hans_Rosling

https://en.wikipedia.org/wiki/David_McCandless

https://en.wikipedia.org/wiki/Kim_Albrecht

https://en.wikipedia.org/

Συλλογή παραγράφων από κάθε σελίδα. 

In [5]:
def get_paragraphs(soup, visited_paragraphs, link):
    # Stores paragraphs
    paragraphs = []  

    # Remove the content from superscripts and references
    for sup in soup.find_all(['sup', 'reflist']):
        sup.decompose()  

    # Find all paragraph tags 
    for p in soup.find_all('p'):
        # Extract text from each paragraph
        text = p.get_text()  
        
        # Remove empty paragraphs and those containing the term 'displaystyle' to avoid mathematical functions
        if text and 'displaystyle' not in text.lower():
            # Calculate the number of words in the paragraph
            word_count = len(text.split())  
            
            # Include paragraphs with word count between 50 and 100 and avoid duplicates
            if 50 <= word_count <= 100 and text not in visited_paragraphs:
                # Store paragraph with source link
                paragraphs.append({'text': text, 'link': link}) 
                # Mark the paragraph as visited to avoid repetition
                visited_paragraphs.add(text)  
                        
    # Return filtered paragraphs         
    return paragraphs  


# For tracking visited paragraphs
visited_paragraphs = set()

# Collect paragraphs from starting page
original_paragraphs = get_paragraphs(soup, visited_paragraphs, wiki_url)

display(Markdown("Paragraphs saved from starting page: <br>"))
for paragraph in original_paragraphs:
    print(f"{paragraph['text']}")


Paragraphs saved from starting page: <br>

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

The data is necessary as inputs to the analysis, which is specified based upon the requirements of those directing the analytics (or customers, who will use the finished product of the analysis). The general type of entity upon which the data will be collected is referred to as an experimental unit (e.g., a person or population of people). Specific variables regarding a population (e.g., age and income) may be specified and obtained.  Data may be numerical or categorical (i.e., a text

# 2. Προεπεξεργασία κειμένου (Text Procissing)

Διαμόρφωση κειμένου μετά από αφαίρεση διακόπτουσων λέξεων, ειδικών χαρακτήρων και λημματοποίση όρων. 

In [6]:
def clean_text(text):
    # Tokenize text into individual words and convert to lowercase for better search results
    tokens = nltk.word_tokenize(text.lower())

    # List to store cleaned tokens
    cleaned_tokens = []

    # Remove punctuation and non alphanumeric tokens
    for token in tokens:
        if token not in string.punctuation and token.isalnum():
            cleaned_tokens.append(token)

    # List to store tokens after stopword removal
    filtered_tokens = []

    # Remove stopwords from the tokenized text
    for token in cleaned_tokens:
        if token not in stop_words:
            filtered_tokens.append(token)

    # Initialize a list to store lemmatized tokens
    lemmatized_tokens = []

    # Lemmatize the tokens
    for token in filtered_tokens:
        lemmatized_tokens.append(lemmatizer.lemmatize(token))

    # Return processed tokens
    return lemmatized_tokens

cleaned_paragraphs = []

# Clean the collected paragraphs using text preprocessing
for paragraph in original_paragraphs:
    #clean_paragraph = ' '.join(clean_text(paragraph))
    clean_paragraph = clean_text(paragraph['text'])
    cleaned_paragraphs.append({'tokens': clean_paragraph, 'link': paragraph['link']})

display(Markdown("Processed paragraphs from starting page: <br>"))
for paragraph in cleaned_paragraphs:
    print(f"{paragraph['tokens']}\n")
    

Processed paragraphs from starting page: <br>

['data', 'analysis', 'process', 'inspecting', 'cleansing', 'transforming', 'modeling', 'data', 'goal', 'discovering', 'useful', 'information', 'informing', 'conclusion', 'supporting', 'data', 'analysis', 'multiple', 'facet', 'approach', 'encompassing', 'diverse', 'technique', 'variety', 'name', 'used', 'different', 'business', 'science', 'social', 'science', 'domain', 'today', 'business', 'world', 'data', 'analysis', 'play', 'role', 'making', 'decision', 'scientific', 'helping', 'business', 'operate', 'effectively']

['data', 'necessary', 'input', 'analysis', 'specified', 'based', 'upon', 'requirement', 'directing', 'analytics', 'customer', 'use', 'finished', 'product', 'analysis', 'general', 'type', 'entity', 'upon', 'data', 'collected', 'referred', 'experimental', 'unit', 'person', 'population', 'people', 'specific', 'variable', 'regarding', 'population', 'age', 'income', 'may', 'specified', 'obtained', 'data', 'may', 'numerical', 'categorical', 'text', 'label', 'number']

['mathemat

Ληψη και εξαγωγή παραγράφων από κάθε σύνδεσμο. 
Λαμβάνεται το περιεχόμενο από έναν σύνδεσμο και εξάγονται οι παράγραφοι, σε περίπτωση που δεν έχουν ήδη καταχωρηθεί.

In [7]:
def paragraphs_within_links(link, visited_links, visited_paragraphs):
    # Skip the link and return an empty list if it has already been processed
    if link in visited_links:
        return []

    try:
        # Send get request to link and raise error if the request was unsuccessful
        response = requests.get(link)
        response.raise_for_status()
        
        # Parse the pages content
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Mark current link as visited
        visited_links.add(link)
        
        # Extract and return paragraphs from the page
        return get_paragraphs(soup, visited_paragraphs, link)
        
    # Exception for request errors
    except requests.RequestException as e:
        print(f"Error retrieving links: {e}")
        # Return an empty list if there is an error
        return []  

display(Markdown("Scraping paragraphs from each link <br>"))
progress = widgets.IntProgress(
    value = 0,
    min = 0,
    max = len(links)
)

display(progress)

# Get paragraphs from each link and avoid re visiting links
for i, link in enumerate(links):
    # Get paragraphs from current link
    link_paragraphs = paragraphs_within_links(link, visited_links, visited_paragraphs)

    # Extend list of original paragraphs adding the new ones
    original_paragraphs.extend(link_paragraphs)

    # Clean and store paragraphs from current link
    for paragraph in link_paragraphs:
        #clean_paragraph = ' '.join(clean_text(paragraph))
        clean_paragraph = clean_text(paragraph['text'])
        cleaned_paragraphs.append({'tokens': clean_paragraph, 'link': paragraph['link']})

    # Update progress bar
    progress.value = i + 1


Scraping paragraphs from each link <br>

IntProgress(value=0, max=70)

Αποθήκευση επιλεγμένων παραγράφων στην αρχική και επεξεργασμένη μορφή τους.

In [8]:
store_things(original_paragraphs, 'wikipedia_paragraphs.json')
store_things(cleaned_paragraphs, 'wikipedia_paragraphs_cleaned.json')

display(Markdown("How some of the paragraphs are saved: <br>"))
for paragraph in original_paragraphs[:10]:
    text = paragraph['text']
    link = paragraph['link']

    print(f"text: {text}")
    print(f"link: {link}\n\n")
    

How some of the paragraphs are saved: <br>

text: Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

link: https://en.wikipedia.org/wiki/Data_analysis


text: The data is necessary as inputs to the analysis, which is specified based upon the requirements of those directing the analytics (or customers, who will use the finished product of the analysis). The general type of entity upon which the data will be collected is referred to as an experimental unit (e.g., a person or population of people). Specific variables regarding a population (e.g., age and income) may be specified a

In [9]:
display(Markdown("How some of the processed and tokenised paragraphs are saved: <br>"))
for paragraph in cleaned_paragraphs[:10]:
    tokens = paragraph['tokens']
    link = paragraph['link']

    print(f"tokens: {tokens}\n")
    print(f"link: {link}\n\n")
    

How some of the processed and tokenised paragraphs are saved: <br>

tokens: ['data', 'analysis', 'process', 'inspecting', 'cleansing', 'transforming', 'modeling', 'data', 'goal', 'discovering', 'useful', 'information', 'informing', 'conclusion', 'supporting', 'data', 'analysis', 'multiple', 'facet', 'approach', 'encompassing', 'diverse', 'technique', 'variety', 'name', 'used', 'different', 'business', 'science', 'social', 'science', 'domain', 'today', 'business', 'world', 'data', 'analysis', 'play', 'role', 'making', 'decision', 'scientific', 'helping', 'business', 'operate', 'effectively']

link: https://en.wikipedia.org/wiki/Data_analysis


tokens: ['data', 'necessary', 'input', 'analysis', 'specified', 'based', 'upon', 'requirement', 'directing', 'analytics', 'customer', 'use', 'finished', 'product', 'analysis', 'general', 'type', 'entity', 'upon', 'data', 'collected', 'referred', 'experimental', 'unit', 'person', 'population', 'people', 'specific', 'variable', 'regarding', 'population', 'age', 'income', 'may', 'specified', 'obtained', 'data', 'may'

# 3. Ευρετήριο (Indexing)

Δημιουργία ανεστραμμένου ευρετήριου από τις επιλεγμένες και επεξεργασμένες παραγράφους και αποθήκευσή του σε αρχείο JSON.

In [10]:
def build_inverted_index(cleaned_paragraphs):
    inverted_index = defaultdict(set)
    
    # Look through each tokenized paragraph and assign a unique ID
    for paragraph_id, paragraph in enumerate(cleaned_paragraphs):
        # Get the tokens from each paragraph
        tokens = paragraph['tokens']
        
        # Add each token to the inverted index with its paragraph ID
        for token in tokens:
            inverted_index[token].add(paragraph_id)
             
    return inverted_index


def store_inverted_index(inverted_index):
    try:
        # Convert sets to lists to save in JSON file
        serializable_index = {}
        for term, paragraph_ids in inverted_index.items():
            serializable_index[term] = list(paragraph_ids)

        # Save the converted index to the file
        with open('inverted_index.json', 'w') as file:
            json.dump(serializable_index, file, indent = 4)

    except Exception as e:
        print(f"____Error saving inverted index____\n{e}")

# Build inverted index from cleaned paragraphs
inverted_index = build_inverted_index(cleaned_paragraphs)

# Store generated inverted index
store_inverted_index(inverted_index)


# Convert the inverted index into a list of token and paragraph IDs
index = list(inverted_index.items())
# Create a data frame from the inverted index
inverted_df = pd.DataFrame(index, columns = ['Token', 'Paragraph ID'])
# Display the inverted index
display(Markdown("Inverted Index <br>"))
pd.set_option('display.max_colwidth', None)
display(inverted_df)


Inverted Index <br>

Unnamed: 0,Token,Paragraph ID
0,data,"{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 532, 31, 38, 43, 44, 48, 49, 50, 51, 52, 54, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 70, 74, 75, 77, 79, 80, 81, 82, 83, 86, 87, 88, 89, 526, 92, 93, 95, 97, 101, 102, 108, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 134, 135, 136, 142, 143, 144, 145, 146, 149, 150, 151, 159, 161, 163, ...}"
1,analysis,"{0, 1, 514, 5, 7, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 528, 529, 27, 35, 36, 40, 44, 59, 61, 63, 65, 66, 67, 75, 77, 78, 82, 92, 113, 116, 120, 121, 122, 123, 124, 127, 134, 135, 137, 142, 143, 145, 146, 152, 168, 172, 173, 178, 180, 203, 212, 255, 297, 299, 304, 318, 319, 323, 340, 350, 360, 363, 387, 400, 401, 410, 429, 432, 435, 449, 454, 472, 475, 487, 510}"
2,process,"{0, 512, 132, 5, 516, 265, 12, 13, 268, 269, 270, 271, 273, 275, 276, 21, 277, 23, 278, 279, 26, 27, 280, 282, 31, 419, 429, 56, 326, 333, 80, 339, 468, 87, 347, 350, 479, 480, 481, 401, 486, 402, 377}"
3,inspecting,{0}
4,cleansing,{0}
...,...,...
4958,simulated,{532}
4959,instability,{532}
4960,implicit,{533}
4961,demanding,{533}


# 4. Μηχανή αναζήτησης (Search Engine) 

α) Επεξεργασία ερωτήματος (Query Processing)

Μετατροπή ερωτήματος από infix σε postfix μορφή και ό,τι δεν συμπεριλαμβάνεται στην άλγεβρα Boole, γίνεται lemmatized. 

In [11]:
def infix_to_postfix(query):
    # Define operator precedence (higher value means higher precedence)
    precedence = {'NOT': 3, 'AND': 2, 'OR': 1}
    
    # Output and operator stack
    output = []  
    operators = []

    # Split the query into tokens
    tokens = query.split()
    
    # Process each token
    for token in tokens:
        # If the token is an operator handle based on precedence
        if token in precedence:
            # Pop operators with higher or equal precedence from the stack
            while operators and precedence.get(operators[-1], 0) >= precedence[token]:
                output.append(operators.pop())
            # Push the current operator to the stack
            operators.append(token)  

        # If the token is left parenthesis push it onto the stack
        elif token == '(':
            operators.append(token)

        # If the token is right parenthesis pop until the matching left parenthesis
        elif token == ')':
            while operators and operators[-1] != '(':
                output.append(operators.pop())
            # Remove left parenthesis from stack
            operators.pop()  

        # If the token is a word or other charachters 
        else:
            # Lemmatize and turn into lowecase
            token = lemmatizer.lemmatize(token.lower())
            # Rremove word or number charachters
            if token.isalnum():
                output.append(token)

    # Pop any remaining operators from the stack and append them to output
    while operators:
        output.append(operators.pop())

    # Return the query in postfix notation
    return output


display(Markdown("Example of how queries are processed <br>"))
test_query = "( data AND analysis ) OR NOT science"
# Convert the query from infix to postfix notation
test_postfix_query = infix_to_postfix(test_query)

# Print the original query and its postfix conversion
print(f"Original queryn (Infix):   {test_query}")
print(f"Converted query (Postfix): {(test_postfix_query)}\n")


Example of how queries are processed <br>

Original queryn (Infix):   ( data AND analysis ) OR NOT science
Converted query (Postfix): ['data', 'analysis', 'AND', 'science', 'NOT', 'OR']



Συλλογή παραγράφων με βάση το ήδη επεξεργασμένο ερώτημα. 

In [12]:
def evaluate_postfix(postfix_tokens, inverted_index, num_paragraphs):
    # Evaluating the postfix expression
    stack = []  
    
    # Handling NOT operations
    all_paragraphs = set(range(num_paragraphs))  

    # Look through each token in the expression and preform nessesary operations
    for token in postfix_tokens:
        if token == 'AND':
            # Pop the top two sets
            right = stack.pop()  
            left = stack.pop()
            # Push the result of addition to stack
            stack.append(left & right)  

        elif token == 'OR':
            right = stack.pop()
            left = stack.pop()
            # Push the result of union to stack
            stack.append(left | right)  

        # If the token is NOT operator calculate the difference 
        elif token == 'NOT':
            operand = stack.pop()
            # Push ducuments that are not in list to stack
            stack.append(all_paragraphs - operand)  

        # If the token is search term retrieve the matching paragraph IDs from inverted index
        else:
            # Push matching paragraph IDs to stack
            stack.append(inverted_index.get(token, set())) 

    # Return the matching paragraph IDs if there are any or empty set
    if stack:
        return stack.pop()
    else:
        return set()


test_matching_paragraphs = evaluate_postfix(test_postfix_query, inverted_index, len(cleaned_paragraphs))
display(Markdown("Matching paragraph IDs for example query:"))
print(f"{test_matching_paragraphs}")


Matching paragraph IDs for example query:

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 42, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 116, 117, 120, 121, 122, 123, 124, 126, 127, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 158, 159, 160, 161, 162, 164, 166, 168, 169, 171, 172, 173, 176, 177, 178, 180, 181, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244

<br>β) Κατάταξη αποτελεσμάτων (Ranking)

Δημιουργία πίνακα TF-IDF με βάση τα αποτελέσματα του ερωτήματος. 

In [13]:
def tf_idf(results, cleaned_paragraphs):
    # Return nothing if there are no results
    if not results:
        return None, [], [], None

    # Store the filtered paragraphs and their IDs
    filtered_paragraphs = []
    filtered_ids = []

    # Extract the text and IDs of paragraphs that match the query results
    for paragraph_id in results:
        # Join the tokens of each paragraph into a single string for processing
        filtered_paragraphs.append(' '.join(cleaned_paragraphs[paragraph_id]['tokens']))
        filtered_ids.append(paragraph_id)

    # Initialize and compute the TF-IDF matrix of the paragraphs
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(filtered_paragraphs)

    # Return the TF-IDF matrix, paragraphs, their IDs and TF-IDF vectorizer
    return tfidf_matrix, filtered_paragraphs, filtered_ids, vectorizer
    

Κατάταξη αποτελεσμάτων με τον αλγόριθμο Vector Space Model και επιστροφή της παραγράφου με μεγαλύτερη τη βαθμολόγηση από κάθε σχετικό σύνδεσμο.

In [14]:
def vector_space_model(cleaned_query, tfidf_matrix, original_paragraphs, filtered_ids, vectorizer):
    # Join the query tokens into a string
    cleaned_query = ' '.join(cleaned_query)

    # Transform the query into a TF-IDF vector using the TF-IDF vectorizer
    query_vector = vectorizer.transform([cleaned_query])

    # Compute the cosine similarity between the query and the TF-IDF matrix
    cosine_similarities = cosine_similarity(query_vector, tfidf_matrix)[0]

    # Store the paragraph with the biggest score for each link
    link_top_scores = {}

    # Look through each paragraph and its similarity score
    for paragraph_id, score in zip(filtered_ids, cosine_similarities):
        # Retrieve the original paragraph data to print
        original = original_paragraphs[paragraph_id]
        # Extract the link of current paragraph
        link = original['link']  

        # If the link is new or the score is higher than the existing one update the record
        if link not in link_top_scores or score > link_top_scores[link]['score']:
            link_top_scores[link] = {
                'paragraph_id': paragraph_id,
                'link': link,
                'text': original['text'],
                'score': score
            }

    # Sort the results in descending order
    sorted_scores = sorted(
        link_top_scores.values(),
        key = lambda x: x['score'],
        reverse = True
    )

    # Return highest ranked paragraphs 
    return sorted_scores


Κατάταξη αποτελεσμάτων με τον αλγόριθμο Okapi BM25 και επιστροφή της παραγράφου με τη μεγαλύτερη βαθμολόγηση από κάθε σχετικό σύνδεσμο.

In [15]:
def okapi_bm25(cleaned_query, filtered_paragraphs, filtered_ids, original_paragraphs):
    # Tokenize the filtered paragraphs
    tokenized_paragraphs = []

    # For every paragraph
    for paragraph in filtered_paragraphs:
        # Split into tokens
        tokens = paragraph.split(' ')
        tokenized_paragraphs.append(tokens)

    # Initialize BM25 Okapi and fit it on the tokenized paragraphs
    bm25 = BM25Okapi(tokenized_paragraphs)

    # Compute the BM25 scores for the query
    bm25_scores = bm25.get_scores(cleaned_query)

    # Keep track of the highest ranked paragraph for each link
    link_top_scores = {}

    # Look through the filtered paragraph IDs and their BM25 scores
    for paragraph_id, score in zip(filtered_ids, bm25_scores):
        # Retrieve original paragraph data for printing
        original = original_paragraphs[paragraph_id]
        link = original['link']

        # Store the paragraph only if it has the highest score for this link
        if link not in link_top_scores or score > link_top_scores[link]['score']:
            link_top_scores[link] = {
                'paragraph_id': paragraph_id,
                'link': link,
                'text': original['text'],
                'score': score
            }

    # Sort the results descending order
    sorted_scores = sorted(
        link_top_scores.values(),
        key = lambda x: x['score'],
        reverse = True
    )

    # Return highest ranked paragraphs
    return sorted_scores


Δημιουργία μηχανής αναζήτησης (Search Engine) με τη δυνατότητα επιλογής αλγόριθμου αναζήτησης και εισαγωγή ερωτήματος από τον χρήστη. 

In [16]:
def search_engine(inverted_index, original_paragraphs, cleaned_paragraphs):
    # Select the ranking algorithm
    toggle_buttons = widgets.ToggleButtons(
        options = ['Boolean retrieval', 'Vector Space Model', 'Okapi BM25'],
        description = 'Select ranking algorithm'
    )
    
    space = widgets.HTML(value = '<br>')

    # Enter search queries
    input_text = widgets.Text(
        placeholder = 'Input search query here',
        layout = widgets.Layout(width = '70%')
    )
    
    # Button to activate searching
    search_button = widgets.Button(
        description = 'Search',
        button_style = 'primary'
    )
    
    # Display search results
    output = widgets.Output()

    # Handles the search when the button is clicked
    def search(b):
        # Clear previous results
        output.clear_output()  
        # Get the selected ranking algorithm
        algorithm = toggle_buttons.value  
        # Get query that user entered
        query = input_text.value  

        # Convert the query to postfix and evaluate using the inverted index
        postfix_query = infix_to_postfix(query)
        results = evaluate_postfix(postfix_query, inverted_index, len(cleaned_paragraphs))
        
        # Apply TF-IDF transformation on the resulting paragraphs
        tfidf_matrix, filtered_paragraphs, filtered_ids, vectorizer = tf_idf(results, cleaned_paragraphs)
        # Process query for ranking
        cleaned_query = clean_text(query)  

        # Display search results based on the selected ranking algorithm
        with output:
            if filtered_paragraphs:
                print(f"Total matching paragraphs: {len(filtered_ids)}\n")

                # Display results with Boolean retrieval
                if algorithm == 'Boolean retrieval':
                    displayed_links = set()
                    for i, paragraph_id in enumerate(filtered_ids):
                        original = original_paragraphs[paragraph_id]
                        if original['link'] not in displayed_links:
                            displayed_links.add(original['link'])
                            print(f"Link: {original['link']}\n{original['text']}")
                    print(f"Total links shown: {len(displayed_links)}\n")

                # Display ranked results with VSM
                elif algorithm == 'Vector Space Model':
                    ranked_results = vector_space_model(cleaned_query, tfidf_matrix, original_paragraphs, filtered_ids, vectorizer)
                    displayed_links = set()
                    for result in ranked_results:
                        displayed_links.add(result['link'])
                        print(f"Link: {result['link']}      (Score: {result['score']:.3f})\n{result['text']}")
                    print(f"Total links shown: {len(displayed_links)}\n")
                
                # Display ranked results with okapi BM25
                elif algorithm == 'Okapi BM25':
                    ranked_results = okapi_bm25(cleaned_query, filtered_paragraphs, filtered_ids, original_paragraphs)
                    displayed_links = set()
                    for result in ranked_results:
                        displayed_links.add(result['link'])
                        print(f"Link: {result['link']}      (Score: {result['score']:.3f})\n{result['text']}")
                    print(f"Total links shown: {len(displayed_links)}\n")
                
            else:
                # If no results match the query
                print(f"No results found for '{query}' using {algorithm}.")

    # Connect the search button to the search function
    search_button.on_click(search)

    # Display widgets
    display(widgets.VBox([toggle_buttons, space, input_text, search_button, output]))

search_engine(inverted_index, original_paragraphs, cleaned_paragraphs)


VBox(children=(ToggleButtons(description='Select ranking algorithm', options=('Boolean retrieval', 'Vector Spa…

# 5. Αξιολόγηση συστήματος

Εισαγωγή και επεξεργασία δεδομένων από το CISI dataset. 

In [17]:
# Read and process CISI documents
def read_documents():
    # File path to the CISI documents
    fp = '/Users/vivh/ergasia/cisi/CISI.ALL/CISI.ALL'

    with open(fp, 'r') as f:
        # To merge content across lines
        merged = ''  

        # Read file line by line and merge content while preserving field tags
        for a_line in f:
            # Identify field tags (.I, .X)
            if a_line.startswith('.'):  
                merged += '\n' + a_line.strip()
            else:
                # Add text to the merged content
                merged += ' ' + a_line.strip()  

    # Store processed documents
    documents = []  
    # Temporary storage for document text
    content = ''  
    # Temporary storage for document ID
    doc_id = '' 

    # Process content line by line
    for a_line in merged.split('\n'):
        # New document identifier
        if a_line.startswith('.I'):
            # Save previous document
            if doc_id and content:  
                documents.append({'id': doc_id, 'text': content.strip()})

            # Extract document ID
            doc_id = a_line.split(' ')[1].strip()  
            # Reset for the new document
            content = ''  
        
        # End of document identifier    
        elif a_line.startswith('.X'): 
            if doc_id and content:
                documents.append({'id': doc_id, 'text': content.strip()})
                
            doc_id = ''
            content = ''
            
        else:
            # Add the content excluding the tags
            content += a_line[3:].strip() + ' '

    # Last document in the file
    if doc_id and content:
        documents.append({'id': doc_id, 'text': content.strip()})
    
    # Save processed documents to a JSON file
    store_things(documents, 'cisi_documents.json')
    return documents


# Read and process CISI queries
def read_queries():
    fp = '/Users/vivh/ergasia/cisi/CISI.QRY'

    with open(fp, 'r') as f:
        merged = '' 

        for a_line in f:
            if a_line.startswith('.'):
                merged += '\n' + a_line.strip()
            else:
                merged += ' ' + a_line.strip() 

    queries = [] 
    content = ''  
    qry_id = '' 

    for a_line in merged.split('\n'):
        if a_line.startswith('.I'):
            if qry_id and content: 
                queries.append({'id': qry_id, 'text': content.strip()})
                
            qry_id = a_line.split(' ')[1].strip() 
            content = '' 
        
        elif a_line.startswith('.W') or a_line.startswith('.T'): 
            content += a_line.strip()[3:] + ' '

    if qry_id and content:
        queries.append({'id': qry_id, 'text': content.strip()})

    store_things(queries, 'cisi_queries.json')
    return queries


# Function to read and process CISI relevance mappings
def read_mappings():
    fp = '/Users/vivh/ergasia/cisi/CISI.REL'

    with open(fp, 'r') as f:
        # Store processed mappings
        mappings = []  

        # Read file line by line
        for a_line in f:
            # Split the line into components
            voc = a_line.strip().split()  
            # Extract query ID
            qry_id = voc[0].strip() 
            # Extract document ID
            doc_id = voc[1].strip()  

            # Append the mapping as a dictionary
            mappings.append({'query_id': qry_id, 'doc_id': doc_id})

    store_things(mappings, 'cisi_mappings.json')
    return mappings


In [18]:
documents = read_documents()

display(Markdown("How some of the CISI documents are saved: <br>"))
for paragraph in documents[:10]:
    d_id = paragraph['id']
    text = paragraph['text']

    print(f"id: {d_id}")
    print(f"text: {text}\n")

How some of the CISI documents are saved: <br>

id: 1
text: 18 Editions of the Dewey Decimal Classifications Comaromi, J.P. The present study is a history of the DEWEY Decimal Classification.  The first edition of the DDC was published in 1876, the eighteenth edition in 1971, and future editions will continue to appear as needed.  In spite of the DDC's long and healthy life, however, its full story has never been told.  There have been biographies of Dewey that briefly describe his system, but this is the first attempt to provide a detailed history of the work that more than any other has spurred the growth of librarianship in this country and abroad.

id: 2
text: Use Made of Technical Libraries Slater, M. This report is an analysis of 6300 acts of use in 104 technical libraries in the United Kingdom. Library use is only one aspect of the wider pattern of information use.  Information transfer in libraries is restricted to the use of documents.  It takes no account of documents used outside the library, still less of information tra

In [19]:
queries = read_queries()  

display(Markdown("How some of the CISI queries are saved: <br>"))
for paragraph in queries[:10]:
    d_id = paragraph['id']
    text = paragraph['text']

    print(f"id: {d_id}")
    print(f"text: {text}\n")

How some of the CISI queries are saved: <br>

id: 1
text: What problems and concerns are there in making up descriptive titles? What difficulties are involved in automatically retrieving articles from approximate titles? What is the usual relevance of the content of articles to their titles?

id: 2
text: How can actually pertinent data, as opposed to references or entire articles themselves, be retrieved automatically in response to information requests?

id: 3
text: What is information science?  Give definitions where possible.

id: 4
text: Image recognition and any other methods of automatically transforming printed text into computer-ready form.

id: 5
text: What special training will ordinary researchers and businessmen need for proper information management and unobstructed use of information retrieval systems? What problems are they likely to encounter?

id: 6
text: What possibilities are there for verbal communication between computers and humans, that is, communication via the spoken word?

id: 7
text: Describe presently w

In [20]:
mappings = read_mappings()

display(Markdown("How CISI mappings are saved: <br>"))
# Convert mappings to data frame
mappings_df = pd.DataFrame(mappings)
pd.set_option('display.max_colwidth', None)
display(mappings_df)


How CISI mappings are saved: <br>

Unnamed: 0,query_id,doc_id
0,1,28
1,1,35
2,1,38
3,1,42
4,1,43
...,...,...
3109,111,422
3110,111,448
3111,111,485
3112,111,503


Υπολογισμός ακρίβειας, ανάκλησης, F1-score και μέσης ακρίβειας με αγλόριθμο αναζήτησης Okapi BM25 για τα έγγραφα και ερωτήματα του CISI dataset. 

In [21]:
def testing(documents, queries, mappings):
    # Stores cleaned documents
    cleaned_paragraphs = {}

    for doc in documents:
        doc_id = doc['id']
        # Clean and tokenize the document text, then store it
        cleaned_text = clean_text(doc['text'])  
        cleaned_paragraphs[doc_id] = {'tokens': cleaned_text}

    # Build the inverted index
    inverted_index = defaultdict(set)
    for doc_id, doc in cleaned_paragraphs.items():
        for term in doc['tokens']:  # Use 'tokens' instead of 'text'
            inverted_index[term].add(doc_id)

    # Store average precision for each query
    average_precisions = []

    for query in queries:
        query_id = query['id']
        query_text = query['text']

        # Preprocess the query text
        cleaned_query = ' '.join(clean_text(query_text))
      
        # Convert the query to postfix and evaluate using the inverted index
        postfix_query = infix_to_postfix(cleaned_query)
        results = evaluate_postfix(postfix_query, inverted_index, len(cleaned_paragraphs))

        # Apply TF-IDF transformation on the resulting paragraphs
        tfidf_matrix, filtered_paragraphs, filtered_ids, vectorizer = tf_idf(results, cleaned_paragraphs)

        if not filtered_paragraphs:  
            print(f"\nQuery ID: {query_id}")
            print(f"Query: {query_text}")
            print("No matches found.\n")
            average_precisions.append(0)
            continue

        # Use TF-IDF results as input to BM25
        tokenized_docs = []
        for doc_id in filtered_ids:
            tokenized_docs.append(cleaned_paragraphs[doc_id]['tokens'])
    
        bm25 = BM25Okapi(tokenized_docs)    
        bm25_scores = bm25.get_scores(postfix_query)

        # Rank documents based on BM25 scores
        ranked_results = sorted(
            zip(filtered_ids, bm25_scores),
            key = lambda x: x[1],
            reverse = True
        )

        # Extract retrieved document IDs with scores > 0
        retrieved_docs = []
        for doc_id, score in ranked_results:
            if score > 0:
                retrieved_docs.append(doc_id)

       # Find relevant documents for this query
        relevant_docs = set()
        for mapping in mappings:
            if mapping['query_id'] == query_id:
                relevant_docs.add(mapping['doc_id'])
        
        # Calculate precision at each relevant document's position
        true_positives = 0
        precisions = []

        for i, doc_id in enumerate(retrieved_docs, start = 1):
            if doc_id in relevant_docs:
                true_positives += 1
                precision_at_k = true_positives / i
                precisions.append(precision_at_k)

        if retrieved_docs:
            precision = true_positives / len(retrieved_docs)
        else:
            precision = 0
        
        if relevant_docs:
            ap = sum(precisions) / len(relevant_docs)
            recall = true_positives / len(relevant_docs)
        else:
            ap = 0
            
        average_precisions.append(ap)

        if precision + recall > 0:
            f1_score = (2 * precision * recall) / (precision + recall)
        else:
            f1_score = 0;

        # Print metrics for the current query
        display(Markdown("Query"))
        print(f"{query_text}")
        display(Markdown("Matching document IDs"))
        print(f"{retrieved_docs}")
        display(Markdown(f"Precision: {precision:.3f}"))
        display(Markdown(f"Recall: {recall:.3f}"))
        display(Markdown(f"F1-Score: {f1_score:.3f}<br><br>"))

    # Calculate Mean Average Precision 
    map_score = sum(average_precisions) / len(average_precisions) if average_precisions else 0
    display(Markdown(f"Mean Average Precision: {map_score:.3f}"))
    

display(Markdown("Using the CISI dataset to evaluate search engine<br>"))
testing(documents, queries, mappings)


Using the CISI dataset to evaluate search engine<br>

Query

What problems and concerns are there in making up descriptive titles? What difficulties are involved in automatically retrieving articles from approximate titles? What is the usual relevance of the content of articles to their titles?


Matching document IDs

['429', '1299', '1421', '1055', '722', '666', '76', '1090', '1281', '676', '38', '413', '64', '759', '928', '1195', '65', '212', '510', '799', '1265', '589', '541', '813', '195', '1118', '848', '1091', '1124', '1009', '154', '582', '1000', '953', '767', '576', '992', '803', '851', '831', '978', '1230', '1369', '603', '465', '1449', '276', '655', '650', '783', '711', '219', '620', '894', '1432', '869', '820', '52', '201', '338', '524', '269', '415', '483', '196', '1418', '1064', '609', '482', '886', '466', '86', '1002', '322', '192', '225', '726', '1436', '1286', '1164', '1162', '604', '757', '53', '680', '204', '150', '776', '1349', '788', '1089', '906', '221', '402', '495', '875', '215', '920', '811', '854', '193', '651', '493', '863', '921', '40', '861', '715', '775', '246', '333', '1373', '189', '1028', '1197', '1227', '1196', '1272', '865', '981', '904', '354', '614', '403', '588', '687', '551', '90', '919']


Precision: 0.302

Recall: 0.848

F1-Score: 0.446<br><br>

Query

How can actually pertinent data, as opposed to references or entire articles themselves, be retrieved automatically in response to information requests?


Matching document IDs

['381', '597', '1156', '1055', '862', '1078', '892', '526', '1352', '488', '797', '788', '339', '1155', '596', '1158', '1363', '898', '1103', '552', '10', '1138', '1126', '660', '1118', '1120', '562', '711', '783', '1170', '1147', '207', '1130', '451', '695', '218', '223', '1124', '483', '309', '58', '773', '644', '484', '394', '165', '551', '1231', '1327', '871', '706', '891', '382', '1394', '550', '271', '479', '71', '188']


Precision: 0.034

Recall: 0.077

F1-Score: 0.047<br><br>

Query

What is information science?  Give definitions where possible.


Matching document IDs

['1181', '1077', '160', '599', '1037', '1277', '1249', '158', '837', '1169', '899', '496', '1255', '163', '1198', '123', '845', '592', '1445', '582', '1258', '1455', '1309', '1116', '784', '29', '1330', '900', '346', '57', '1016', '1095', '1373', '1241', '671', '1339', '481', '958', '241', '74', '839', '429', '853', '826', '1410', '898', '1013', '591', '1273', '685', '718', '478', '488', '568', '1404', '815', '893', '228', '78', '1266', '1201', '97', '1160', '157', '922', '446', '1082', '801', '774', '1342', '244', '542', '269', '622', '1347', '1393', '379', '1265', '809', '595', '811', '1371', '657', '1231', '816', '46', '194', '915', '1433', '913', '887', '17', '1354', '758', '168', '1379', '645', '1194', '1429', '285', '715', '775', '365', '82', '124', '307', '326', '1072', '1457', '530', '779', '1021', '1239', '1418', '731', '10', '1079', '945', '5', '738', '71', '58', '896', '1066', '1075', '564', '476', '977', '399', '1450', '208', '795', '68', '500', '52', '666', '16']


Precision: 0.036

Recall: 0.114

F1-Score: 0.055<br><br>

Query

Image recognition and any other methods of automatically transforming printed text into computer-ready form.


Matching document IDs

['739', '320', '601', '527', '420', '1341', '421', '80', '556', '252', '980', '1252', '653', '495', '1399', '26', '571', '862', '672', '231', '581', '596', '376', '1105', '94', '484', '648', '351', '1191', '1190', '530', '809', '521', '476', '769', '1042', '473', '875', '50', '42', '748', '19', '516', '478', '953', '1092', '962', '758', '1020', '611', '802', '390', '731', '552', '151', '798', '208', '58', '3', '1014', '263', '1371', '1396', '922', '686', '1165', '787', '150', '1004', '796', '36', '1316', '1415', '907', '628', '855', '402', '1229', '415', '1104', '353', '1046', '1352', '431', '1056', '873', '1247', '886', '1013', '872', '1160', '305', '715', '725', '383', '965', '993', '1136', '331', '789', '1189', '1333', '770', '635', '1378', '874', '621', '180', '130', '632', '191', '109', '1183', '1256', '826', '67', '1423', '585', '10', '618', '1152', '397', '848', '904', '528', '1427', '952', '1335', '600', '12', '310', '165', '399', '825', '728', '692', '400', '735', '1261', '144

Precision: 0.027

Recall: 0.500

F1-Score: 0.052<br><br>

Query

What special training will ordinary researchers and businessmen need for proper information management and unobstructed use of information retrieval systems? What problems are they likely to encounter?


Matching document IDs

[]


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

What possibilities are there for verbal communication between computers and humans, that is, communication via the spoken word?


Matching document IDs

['1045', '400', '386', '1144', '341', '49', '26', '79', '688', '1382', '581', '542', '212', '1240', '807', '667', '419', '489', '657', '131', '517', '680', '160', '498', '150', '666', '820', '563', '636', '1323', '562', '661', '677', '1392', '466', '566', '564', '321', '1175', '512', '649', '329', '687', '577', '643', '1091', '71', '589', '478', '600', '835', '51', '38', '653', '315', '420', '1215', '1381', '676', '1366', '450', '1222', '1388', '402', '662', '620', '1267', '756', '576', '527', '421', '363', '761', '233', '1314', '447', '644', '350', '483', '390', '552', '1358', '791', '946', '476', '1362', '480', '507', '608', '798', '592', '1261', '68', '571', '1265', '1309', '1124']


Precision: 0.010

Recall: 1.000

F1-Score: 0.020<br><br>

Query

Describe presently working and planned systems for publishing and printing original papers by computer, and then saving the byproduct, articles coded in data-processing form, for further use in retrieval.


Matching document IDs

['446', '484', '135', '1364', '615', '675', '1072', '611', '728', '26', '508', '461', '571', '165', '1248', '512', '376', '159', '67', '534', '790', '1180', '1179', '610', '389', '1391', '731', '648', '706', '375', '530', '664', '986', '514', '463', '1171', '490', '120', '1136', '993', '501', '606', '1323', '148', '252', '1264', '687', '429', '528', '704', '644', '473', '617', '641', '28', '686', '826', '565', '898', '703', '630', '156', '659', '798', '243', '454', '451', '1092', '445', '866', '562', '175', '627', '459', '58', '925', '495', '526', '690', '381', '889', '388', '1009', '378', '1078', '478', '596', '779', '754', '481', '179', '1191', '636', '634', '1190', '590', '660', '518', '327', '594', '498', '523', '114', '707', '600', '762', '681', '197', '1120', '321', '319', '839', '637', '670', '129', '71', '773', '737', '531', '516', '620', '458', '1259', '538', '1170', '1305', '68', '1419', '126', '689', '625', '591', '746', '448', '1448', '716', '857', '595', '785', '806', '566

Precision: 0.007

Recall: 0.250

F1-Score: 0.014<br><br>

Query

Describe information retrieval and indexing in other languages. What bearing does it have on the science in general?


Matching document IDs

['159', '434', '52', '1010', '595', '456', '1124', '1443', '590', '825', '1024', '1175', '168', '702', '746', '148', '644', '82', '1448', '257', '388', '514', '477', '195', '888', '1421', '19', '199', '500', '134', '1142', '1427', '1222', '47', '1118', '857', '709', '73', '519', '1430', '597', '1169', '692', '497', '631', '320', '269', '160', '85', '1067', '650', '463', '838', '439', '1161', '25', '553', '309', '534', '158', '1259', '468', '1391', '319', '1249', '706', '484', '1407', '557', '1273', '1113', '845', '129', '819', '1068', '610', '398', '414', '1348', '823', '1349', '1262', '1442', '1201', '784', '582', '1318', '701', '6', '1398', '263', '1342', '1245', '568', '335', '54', '1084', '1447', '246', '787', '229', '1351', '118', '667', '540', '12', '882', '1384', '261', '972', '1292', '306', '1204', '1231', '4', '1401', '1241', '740', '699', '292', '1105', '563', '767', '654', '1203', '930', '880', '206', '929', '265', '1381', '331', '161', '242', '1256', '43', '849', '1018', '8

Precision: 0.011

Recall: 0.111

F1-Score: 0.021<br><br>

Query

What possibilities are there for automatic grammatical and contextual analysis of articles for inclusion in an information retrieval system?


Matching document IDs

['1120', '571', '135', '175', '565', '179', '1144', '1136', '483', '986', '1327', '212', '309', '446', '830', '682', '72', '1419', '989', '490', '591', '572', '291', '497', '482', '916', '1298', '126', '114', '71', '534', '691', '1012', '421', '630', '1092', '606', '27', '615', '1437', '454', '617', '564', '1121', '517', '67', '180', '786', '17', '1173', '381', '73', '419', '1124', '676', '601', '494', '267', '28', '595', '64', '1109', '990', '1114', '659', '1225', '458', '538', '865', '621', '626', '970', '575', '514', '459', '850', '611', '1255', '762', '1105', '445', '151', '478', '822', '1125', '106', '1024', '158', '1171', '507', '254', '1139', '960', '375', '1170', '474', '1078', '666', '1241', '1229', '648', '567', '501', '481', '1053', '889', '1163', '382', '531', '376', '1427', '1409', '707', '827', '898', '1054', '222', '1038', '815', '513', '526', '544', '25', '165', '1098', '1405', '319', '522', '646', '477', '826', '530', '136', '461', '1179', '434', '703', '287', '120', '

Precision: 0.033

Recall: 0.500

F1-Score: 0.062<br><br>

Query

The use of abstract mathematics in information retrieval, e.g. group theory.


Matching document IDs

['1385', '536', '1411', '229', '558', '471', '25', '664', '1340', '518', '259', '1231', '174', '67', '895', '1315', '1015', '643', '321', '549', '590', '1027', '575', '479', '1219', '73', '1282', '827', '1081', '31', '445', '1387', '1125', '368', '532', '610', '1191', '1173', '644', '803', '1190', '1047', '972', '227', '829', '1187', '1064', '168', '1117', '525', '641', '660', '456', '1365', '557', '1360', '627', '1201', '1045', '1244', '810', '1186', '574', '1161', '1326', '1309', '160', '1233', '17', '308', '228', '1220', '1204', '1348', '343', '397', '1202', '1399', '1248', '1333', '1262', '118', '1443', '1150', '1217', '544', '356', '667', '846', '349', '901', '1398', '93', '1455', '247', '1077', '982', '1119', '824', '1292', '1334', '1160', '647', '758', '1046', '1224', '1227', '1329', '1386', '1357', '226', '572', '542', '396', '1037', '819', '1393', '64', '1427', '1185', '911', '1188', '387', '350', '1339', '195', '62', '335', '1066', '1075', '476', '1311', '291', '19', '320', '

Precision: 0.101

Recall: 0.538

F1-Score: 0.171<br><br>

Query

What is the need for information consolidation, evaluation, and retrieval in scientific research?


Matching document IDs

['474', '575', '965', '128', '484', '560', '383', '1098', '134', '381', '771', '513', '6', '899', '378', '88', '1106', '147', '194', '966', '388', '481', '1432', '440', '1174', '259', '385', '1099', '163', '544', '666', '1120', '688', '1256', '1315', '900', '807', '1170', '1346', '905', '98', '1095', '871', '176', '202', '199', '1418', '148', '1264', '553', '526', '769', '654', '1284', '512', '243', '456', '606', '375', '274', '619', '703', '185', '95', '1009', '1289', '1342', '479', '690', '1110', '27', '1144', '1323', '889', '818', '1179', '537', '763', '891', '962', '1200', '1211', '132', '1197', '426', '462', '4', '1401', '405', '328', '160', '439', '937', '254', '314', '1273', '626', '543', '311', '796', '1154', '896', '667', '151', '174', '1427', '341', '1121', '1082', '1344', '1178', '603', '1352', '129', '1130', '1061', '370', '107', '821', '190', '220', '9', '728', '1444', '600', '266', '704', '1047', '1455', '1348', '360', '1155', '1308', '450', '616', '490', '813', '1408', '

Precision: 0.154

Recall: 0.417

F1-Score: 0.225<br><br>

Query

Give methods for high speed publication, printing, and distribution of scientific journals.


Matching document IDs

['552', '1182', '725', '686', '1108', '1167', '748', '543', '41', '1209', '1157', '696', '767', '691', '1097', '1061', '190', '613', '1418', '778', '616', '1432', '1276', '1460', '1338', '1363', '193', '831', '1335', '258', '113', '203', '110', '889', '573', '618', '225', '198', '763', '821', '253', '624', '759', '905', '1290', '1210', '683', '1177', '635', '820', '560', '804', '845', '1350', '685', '1369', '1355', '735', '580', '721', '37', '199', '97', '200', '770', '1114', '1176', '194', '722', '76', '755', '933', '657', '195', '1055', '150', '210', '1373', '111', '1047', '744', '588', '756', '614', '429', '515', '196', '472', '622', '189', '1060', '977', '1301', '776', '415', '87', '201', '757', '1156', '1090', '183', '204', '506', '638', '1109', '255', '1299', '1396', '1131', '219', '167', '1241', '943', '1023', '1083', '65', '10', '1293', '986', '1352', '623', '466', '112', '1071', '715', '775', '1207', '1147', '676', '1232', '1262', '898', '447', '1014', '355', '973', '161', '38

Precision: 0.028

Recall: 0.308

F1-Score: 0.052<br><br>

Query

What criteria have been developed for the objective evaluation of information retrieval and dissemination systems?


Matching document IDs

['1098', '611', '565', '1120', '646', '866', '481', '1106', '128', '137', '515', '59', '1139', '591', '486', '57', '1078', '889', '222', '1105', '1341', '523', '445', '49', '579', '1375', '474', '1298', '519', '1171', '175', '434', '513', '179', '827', '461', '727', '459', '780', '120', '1264', '490', '375', '571', '690', '625', '615', '135', '123', '1126', '67', '606', '801', '224', '670', '134', '1114', '1143', '575', '348', '525', '18', '826', '213', '986', '1281', '508', '27', '265', '1127', '534', '676', '1367', '72', '244', '421', '74', '514', '378', '482', '553', '309', '448', '504', '159', '1417', '1054', '484', '850', '1012', '327', '243', '728', '1099', '1362', '1175', '458', '538', '381', '373', '376', '1136', '595', '190', '446', '560', '644', '4', '1401', '630', '1358', '1180', '358', '839', '955', '671', '1207', '637', '528', '695', '1125', '254', '731', '590', '1092', '1416', '1170', '659', '1040', '449', '648', '501', '136', '1053', '674', '707', '80', '1038', '386', '8

Precision: 0.117

Recall: 0.659

F1-Score: 0.198<br><br>


Query ID: 14
Query: What future is there for automatic medical diagnosis?
No matches found.



Query

How much do information retrieval and dissemination systems, as well as automated libraries, cost? Are they worth it to the researcher and to industry?


Matching document IDs

['1264', '314', '1114', '327', '126', '801', '1205', '1353', '1269', '623', '507', '1126', '236', '1009', '528', '6', '512', '437', '140', '119', '942', '598', '1106', '941', '730', '166', '683', '1058', '383', '1062', '1451', '1320', '368', '581', '355', '759']


Precision: 0.167

Recall: 0.073

F1-Score: 0.102<br><br>

Query

What systems incorporate multiprogramming or remote stations in information retrieval?  What will be the extent of their use in the future?


Matching document IDs

['1258', '883', '631', '636', '481', '1124', '580', '1346', '993', '716', '131', '621', '907', '348', '1362', '655', '916', '32', '386', '801', '123', '53', '847', '485', '423', '1144', '1290', '72', '661', '491', '1356', '897', '482', '1437', '1251', '795', '134', '1045', '1079', '1241', '846', '977', '1383', '1', '1273', '401', '873', '1090', '24', '1043', '1457', '685', '80', '950', '1294', '915', '310', '1088', '320', '718', '1417', '1388', '1439', '1268', '870', '1354', '1390', '112', '901', '453', '1344', '767', '143', '1429', '922', '902', '1082', '878', '1238', '935', '943', '1438', '17', '367', '1149', '100', '418', '1025', '923', '561', '166', '400', '185', '142', '938']


Precision: 0.021

Recall: 0.077

F1-Score: 0.033<br><br>

Query

Means of obtaining large volume, high speed, customer usable information retrieval output.


Matching document IDs

['252', '512', '319', '625', '1207', '104', '113', '395', '893', '894', '1193', '591', '1229', '103', '371', '1432', '660', '595', '376', '636', '1347', '495', '813', '603', '1299', '608', '102', '694', '880', '709', '77', '150', '1163', '1376', '833', '1371', '897', '1304', '493', '428', '789', '892', '962', '517', '1286', '1230', '308', '1252']


Precision: 0.062

Recall: 0.115

F1-Score: 0.081<br><br>

Query

What methods are there for encoding, automatically matching, and automatically drawing structures extended in two dimensions, like the structural formulas for chemical compounds?


Matching document IDs

['668', '677', '1452', '701', '1261', '690', '671', '694', '592', '150', '687', '709', '890', '1092', '669', '569', '673', '705', '674', '600', '1460', '704', '679', '682', '696', '1286', '472', '838', '1180', '1215', '699', '1414']


Precision: 0.125

Recall: 0.364

F1-Score: 0.186<br><br>

Query

Techniques of machine matching and machine searching systems. Coding and matching methods.


Matching document IDs

['175', '483', '820', '179', '200', '487', '1298', '500', '1292', '1398', '603', '814', '790', '1199', '661', '1252', '890', '521', '737', '1391', '357', '668', '151', '731', '117', '670', '254', '321', '1164', '738', '34', '1162', '705', '190', '708', '316', '1127', '1126', '446', '1416', '422', '563', '527', '17', '1092', '278', '798', '663', '643', '302', '824', '1242', '360', '616', '571', '1450', '429', '317', '327', '945', '5', '341', '633', '1124', '561', '72', '244', '962', '1396', '683', '565', '815', '1191', '373', '1105', '1190', '827', '408', '690', '158', '1125', '309', '1044', '73', '1121', '1419', '889', '424', '530', '451', '1179', '1163', '998', '1326', '1053', '1035', '615', '758', '659', '666', '425', '842', '1360', '611', '1341', '553', '707', '332', '525', '543', '1327', '1098', '822', '423', '475', '1041', '1061', '598', '806', '1057', '377', '865', '1381', '802', '1157', '192', '116', '262', '1382', '4', '1401', '55', '252', '478', '265', '769', '1194', '625', '1

Precision: 0.086

Recall: 0.284

F1-Score: 0.133<br><br>

Query

Testing automated information systems.


Matching document IDs

['827', '134', '336', '595', '1139', '180', '177', '128', '572', '979', '1298', '865', '815', '874', '1114', '860', '433', '895', '174', '916', '406', '868', '990', '644', '986', '136', '1120', '531', '1255', '190', '849', '1175', '1038', '458', '538', '135', '254', '1121', '630', '1092', '517', '1125', '1128', '1136', '611', '574', '1143', '459', '375', '1112', '474', '445', '1077', '535', '530', '1171', '607', '481', '591', '497', '1223', '140', '501', '1053', '707', '27', '565', '67', '615', '532', '175', '1173', '1110', '1170', '179', '158', '74', '621', '1078', '648', '571', '123', '1105', '421', '120', '606', '1080', '826', '319', '526', '1179', '682', '213', '114', '1104', '49', '642', '553', '119', '1293', '703', '1405', '66', '671', '54', '1413', '1106', '434', '1305', '690', '948', '454', '1410', '1191', '490', '347', '523', '1022', '1326', '639', '1289', '64', '525', '1084', '1447', '72', '1361', '1190', '461', '389', '1282', '126', '1012', '57', '1283', '381', '137', '670',

Precision: 0.181

Recall: 0.646

F1-Score: 0.283<br><br>

Query

The need to provide personnel for the information field.


Matching document IDs

['339', '216', '131', '137', '254', '553', '1206', '412', '163', '123', '1094', '134', '125', '471', '497', '1224', '140', '17', '138', '648', '323', '898', '899', '132', '360', '136', '341', '583', '796', '1256', '1128', '1114', '388', '1070', '133', '199', '410', '1424', '60', '1016', '151', '514', '373', '1412', '1164', '1162', '905', '454', '773', '175', '363', '1325', '598', '1076', '1308', '68', '707', '770', '1179', '114', '1081', '1027', '1168', '640', '505', '460', '462', '37', '1432', '722', '107', '461', '405', '1323', '376', '243', '1207', '828', '156', '618', '325', '328', '1271', '419', '1382', '970', '1411', '257', '148', '665', '736', '1149', '1263', '1120', '973', '174', '1009', '821', '439', '676', '1348', '801', '641', '160', '482', '966', '557', '950', '504', '755', '207', '1343', '556', '472', '101', '400', '371', '1075', '791', '1218', '457', '1355', '116', '661', '198', '573', '917', '1230', '272', '344', '915', '1260', '1088', '1418', '636', '1403', '1310', '384

Precision: 0.032

Recall: 0.240

F1-Score: 0.056<br><br>

Query

Automated information in the medical field.


Matching document IDs

['216', '1027', '1114', '410', '174', '215', '148', '136', '203', '221', '1120', '514', '1071', '1128', '133', '131', '137', '163', '60', '373', '1412', '1094', '1164', '1162', '454', '254', '175', '363', '17', '553', '598', '1076', '471', '707', '770', '1179', '497', '323', '114', '1081', '1168', '640', '505', '460', '648', '462', '37', '899', '722', '107', '461', '132', '360', '140', '339', '1323', '1224', '376', '243', '1207', '828', '156', '583', '138', '123', '618', '796', '325', '328', '125', '341', '419', '1382', '970', '1411', '257', '665', '736', '199', '1149', '1263', '134', '1009', '821', '439', '676', '1348', '801', '641', '160', '388', '773', '482', '966', '557', '504', '1256', '755', '412', '151', '1343', '556', '472', '101', '400', '371', '1075', '791', '1218', '457', '1355', '116', '661', '198', '1070', '573', '917', '1230', '272', '344', '915', '1260', '1088', '636', '1403', '905', '1310', '384', '586', '1016', '852', '1248', '906', '1388', '1157', '1332', '510', '68',

Precision: 0.047

Recall: 0.170

F1-Score: 0.074<br><br>

Query

Amount of use of books in libraries. Relation to need for automated information systems .


Matching document IDs

['202', '177', '874', '979', '336', '406', '916', '990', '114', '1073', '378', '947', '459', '593', '535', '1309', '948', '126', '1139', '1053', '1125', '180', '388', '478', '560', '865', '860', '501', '1038', '174', '381', '136', '27', '572', '433', '137', '67', '1298', '358', '408', '670', '458', '1109', '538', '648', '502', '1170', '364', '925', '511', '691', '325', '1114', '449', '1349', '319', '1410', '454', '986', '1361', '641', '1236', '1193', '542', '630', '553', '165', '1110', '582', '262', '461', '267', '1035', '179', '399', '140', '846', '481', '497', '175', '1358', '25', '115', '849', '382', '252', '123', '472', '839', '73', '1171', '445', '17', '594', '644', '1360', '880', '1264', '779', '434', '327', '421', '537', '310', '16', '1007', '660', '1348', '615', '1012', '1120', '4', '1401', '376', '28', '1190', '1362', '132', '1417', '611', '526', '1333', '443', '532', '1080', '213', '1448', '607', '347', '135', '547', '375', '866', '1255', '917', '773', '1173', '664', '490', '

Precision: 0.029

Recall: 0.312

F1-Score: 0.053<br><br>

Query

Educational and training requirements for personnel in the information field. Possibilities for this training.  Needs for programs providing this training.


Matching document IDs

['896', '924', '648', '692', '405', '220', '1166', '1423', '1206', '1239', '1325', '923', '22', '1007', '558', '513', '743', '934', '371', '945', '5', '1275', '858', '1246', '197', '548', '356', '1219', '1387']


Precision: 0.448

Recall: 0.250

F1-Score: 0.321<br><br>

Query

International systems for exchange and dissemination of information.


Matching document IDs

['1284', '440', '1231', '1289', '130', '1245', '1436', '1435', '360', '1362', '1303', '323', '717', '452', '138', '1256', '411', '1378', '611', '98', '1078', '1298', '49', '121', '591', '1412', '1431', '1009', '866', '1153', '490', '1297', '179', '18', '175', '341', '97', '807', '363', '796', '481', '137', '1367', '1281', '711', '513', '460', '947', '241', '497', '375', '633', '228', '941', '311', '629', '1368', '327', '59', '1391', '588', '213', '112', '127', '376', '755', '773', '728', '123', '676', '1396', '889', '80', '801', '482', '12', '1341', '529', '400', '224', '164', '1415', '1125', '607', '1128', '535', '574', '469', '1077', '1112', '1139', '1121', '254', '1158', '1105', '1103', '1116', '474', '66', '85', '1038', '1094', '1156', '1155', '1053', '1191', '1171', '1092', '539', '1190', '1166', '122', '1305', '1223', '540', '599', '458', '1161', '538', '135', '347', '459', '362', '1101', '136', '202', '372', '1096', '1361', '565', '585', '1122', '1405', '152', '640', '630', '166

Precision: 0.034

Recall: 0.667

F1-Score: 0.065<br><br>

Query

Cost and determination of cost associated with systems of automated information.


Matching document IDs

['865', '490', '690', '615', '336', '74', '629', '822', '1100', '639', '839', '584', '158', '497', '591', '671', '321', '433', '27', '623', '214', '842', '367', '1305', '324', '482', '495', '1264', '1151', '1410', '572', '466', '1361', '177', '723', '1139', '1027', '1390', '126', '1297', '180', '704', '17', '348', '1258', '1368', '594', '496', '813', '1298', '1114', '1092', '218', '1421', '851', '737', '334', '174', '1404', '80', '1358', '311', '1132', '1013', '675', '807', '1396', '515', '889', '790', '446', '958', '450', '486', '202', '848', '166', '809', '523', '136', '990', '916', '922', '12', '957', '176', '528', '457', '406', '1196', '674', '811', '1054', '644', '1120', '784', '408', '1124', '986', '694', '426', '400', '1248', '512', '1125', '607', '1128', '535', '574', '469', '1077', '1112', '1121', '175', '254', '375', '137', '1158', '1105', '1103', '1116', '474', '66', '85', '1038', '1094', '1156', '1155', '1053', '1191', '1078', '1171', '539', '1190', '1166', '122', '1223', '

Precision: 0.055

Recall: 0.625

F1-Score: 0.100<br><br>

Query

Computerized information retrieval systems.  Computerized indexing systems.


Matching document IDs

['1293', '1078', '74', '773', '375', '727', '1197', '798', '693', '1193', '461', '865', '565', '376', '257', '1366', '458', '512', '538', '434', '591', '884', '1139', '117', '389', '595', '72', '1448', '1298', '644', '609', '159', '57', '566', '446', '1445', '1136', '590', '660', '1126', '1283', '1419', '630', '611', '507', '459', '615', '889', '478', '445', '381', '348', '135', '1127', '254', '378', '1092', '1170', '1125', '67', '1171', '1120', '501', '504', '1010', '822', '637', '1038', '481', '530', '400', '659', '474', '1012', '827', '796', '826', '497', '648', '606', '319', '1144', '593', '1143', '175', '707', '1053', '1121', '120', '526', '645', '523', '517', '825', '179', '703', '1128', '1255', '966', '1413', '574', '1179', '1124', '690', '709', '27', '114', '140', '1112', '670', '641', '532', '1077', '534', '1223', '514', '510', '535', '579', '525', '607', '484', '451', '309', '158', '682', '779', '594', '1110', '421', '1405', '454', '180', '1282', '1173', '123', '925', '556', 

Precision: 0.132

Recall: 0.591

F1-Score: 0.216<br><br>

Query

Computerized information systems in fields related to chemistry.


Matching document IDs

['696', '156', '116', '676', '1092', '1120', '1164', '1162', '1460', '641', '460', '618', '722', '85', '151', '691', '755', '198', '739', '150', '371', '86', '1072', '953', '705', '731', '735', '743', '619', '1347', '255', '1452', '635', '1275']


Precision: 0.471

Recall: 0.267

F1-Score: 0.340<br><br>

Query

Specific advantages of computerized index systems.


Matching document IDs

['74', '512', '390', '1197', '1136', '49', '1144', '1419', '61', '1293', '701', '80', '595', '1413', '773', '780', '1416', '693', '798', '1193', '647', '1170', '690', '492', '321', '566', '398', '1366', '27', '446', '727', '1078', '465', '213', '825', '406', '257', '472', '884', '865', '117', '630', '44', '682', '1361', '511', '350', '262', '375', '1111', '1339', '165', '1283', '687', '389', '400', '731', '224', '1113', '1080', '386', '517', '243', '741', '531', '1277', '699', '740', '603', '609', '606', '1445', '820', '720', '364', '660', '222', '1326', '677', '1010', '507', '461', '523', '1395', '1261', '318', '779', '476', '1099', '641', '510', '373', '889', '1057', '582', '85', '547', '376', '348', '809', '562', '1418', '522', '1230', '126', '202', '1091', '1024', '830', '1124', '212', '796', '600', '190', '575', '571', '1040', '636', '448', '590', '197', '329', '615', '702', '150', '591', '458', '538', '419', '129', '689', '872', '1038', '1341', '1236', '611', '135', '1007', '709'

Precision: 0.041

Recall: 0.438

F1-Score: 0.075<br><br>

Query

Information dissemination by journals and periodicals.


Matching document IDs

['112', '225', '1114', '1055', '905', '10', '1109', '1097', '198', '494', '199', '1432', '472', '821', '379', '2', '1098', '777', '1168', '210', '790', '765', '1352', '782', '587', '177', '933', '977', '1086', '1260', '865', '792', '951', '1172', '788', '65', '766', '800', '237', '750', '1090', '816', '793', '791', '551']


Precision: 0.578

Recall: 0.194

F1-Score: 0.291<br><br>

Query

Information systems in the physical sciences.


Matching document IDs

['535', '1297', '1370', '1309', '545', '1348', '313', '111', '1179', '804', '137', '85', '1173', '439', '497', '537', '604', '123', '459', '140', '135', '136', '553', '607', '469', '1386', '646', '1405', '1318', '1283', '1022', '652', '786', '599', '1109', '621', '1341', '131', '555', '572', '1338', '640', '60', '456', '1080', '49', '1362', '1181', '837', '1113', '133', '1207', '355', '1094', '513', '243', '598', '1387', '1027', '1169', '172', '372', '132', '373', '1010', '202', '582', '95', '585', '1161', '1436', '544', '126', '803', '1284', '327', '1428', '1051', '557', '866', '1346', '796', '386', '462', '138', '486', '899', '1123', '96', '1303', '363', '496', '163', '199', '914', '592', '338', '505', '1178', '844', '845', '989', '1349', '2', '323', '773', '1198', '616', '371', '339', '460', '1208', '554', '162', '314', '166', '1067', '350', '1339', '1062', '966', '722', '1016', '1249', '1258', '807', '1192', '98', '686', '819', '761', '334', '1085', '1299', '583', '820', '151', '15

Precision: 0.074

Recall: 0.344

F1-Score: 0.121<br><br>

Query

Attempts at computerized and mechanized systems for general libraries. Problems and methods of automated general author and title indexing systems.


Matching document IDs

['336', '865', '849', '64', '553', '644', '451', '265', '1298', '117', '1230', '916', '497', '158', '530', '1448', '820', '1124', '595', '830', '1416', '979', '1000', '434', '582', '676', '1197', '72', '860', '822', '709', '1139', '590', '874', '177', '773', '1024', '1259', '510', '262', '398', '660', '1010', '888', '179', '597', '517', '565', '825', '244', '257', '620', '309', '615', '1419', '159', '192', '654', '1241', '1120', '1067', '690', '889', '4', '1401', '175', '609', '73', '1193', '319', '25', '1105', '335', '54', '884', '557', '796', '180', '389', '150', '476', '406', '18', '1053', '798', '1179', '433', '1293', '1012', '348', '1391', '228', '482', '504', '699', '1126', '603', '483', '354', '880', '593', '1191', '998', '292', '848', '212', '645', '693', '1281', '525', '534', '986', '507', '1127', '826', '659', '1366', '1195', '120', '519', '993', '641', '1418', '376', '1084', '1447', '842', '174', '364', '1196', '484', '990', '514', '606', '67', '666', '1436', '1173', '1362',

Precision: 0.121

Recall: 0.530

F1-Score: 0.197<br><br>

Query

Retrieval systems which provide for the automated transmission of information to the user from a distance.


Matching document IDs

['519', '400', '1365', '1234', '949', '551']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Methods of coding used in computerized index systems.


Matching document IDs

['1180', '1391', '689', '117', '737', '670', '865', '693', '1197', '1366', '595', '390', '731', '798', '508', '309', '72', '1293', '446', '44', '472', '389', '1092', '257', '321', '682', '1024', '1326', '262', '890', '1193', '889', '690', '603', '512', '381', '820', '1035', '318', '677', '376', '510', '1057', '158', '566', '611', '483', '553', '461', '74', '727', '1078', '212', '522', '825', '1419', '49', '773', '1309', '884', '1124', '571', '1261', '641', '1044', '71', '244', '375', '809', '562', '830', '1283', '687', '1416', '291', '179', '1091', '224', '683', '565', '827', '129', '517', '1230', '741', '531', '27', '478', '1277', '699', '740', '190', '175', '815', '609', '1040', '606', '1105', '1445', '200', '701', '720', '842', '660', '1125', '1121', '742', '1191', '1010', '507', '615', '648', '860', '833', '1190', '448', '647', '989', '254', '1126', '1341', '659', '451', '1360', '222', '1179', '1144', '523', '1127', '610', '1395', '325', '1053', '73', '779', '739', '530', '822', '1

Precision: 0.053

Recall: 0.711

F1-Score: 0.098<br><br>

Query

Government supported agencies and projects dealing with information dissemination.


Matching document IDs

['375', '18', '1362', '1298', '889', '481', '360', '341', '717', '1078', '611', '121', '49', '1284', '440', '490', '1297', '591', '175', '363', '179', '137', '1367', '241', '460', '513', '866', '711', '629', '1368', '327', '633', '1281', '376', '138', '123', '311', '1396', '127', '213', '59', '112', '80', '676', '728', '1415', '801', '482', '529', '1256', '1341', '224', '164', '870', '820', '1438', '636', '720', '398']


Precision: 0.034

Recall: 0.047

F1-Score: 0.039<br><br>

Query

What are some of the theories and practices in computer translating of texts from one national language to another?  How can machine translating compete with traditional methods of translating in comprehending nuances of meaning in languages of different structures?


Matching document IDs

['320', '1443', '175', '1046', '19', '1129', '530', '228', '1077', '1118', '419', '817', '637', '1381', '755', '1385', '1020', '746', '1427', '434', '1333', '1180', '1398', '1136', '105', '227', '1386', '343', '218', '936', '387', '25', '1080', '93', '1391', '671', '516', '1313', '1399', '694', '339', '581', '1159', '697', '890', '1387', '1204', '534', '1261', '206', '327', '1340', '954', '686', '816', '461', '1388', '610', '875', '512', '705', '1065', '668', '874', '1128', '1394', '673', '762', '758', '737', '681', '679', '708', '1133', '1346', '704', '544', '1119', '1160', '1092', '675', '606', '670', '669', '715', '972', '678', '501', '1392', '569', '833', '1213', '1345', '700', '95', '803', '511', '706', '616', '923', '1036', '602', '731', '80', '233', '885', '109', '318', '1389', '691', '682', '412', '781', '1110', '1169', '311', '1141', '1068', '15', '1312', '843', '1140', '1177', '709', '695', '1274', '435', '283', '107', '402', '346', '128', '214', '620', '1407', '7', '567', '4

Precision: 0.000

Recall: 0.047

F1-Score: 0.000<br><br>

Query

What lists of words useful for indexing or classifying material are available?  Wanted are lists of terms that are descriptive vocabularies of particular fields or schedules of words that are related to each other in meaningful schemes.  Wanted are lists that have been tested, at least to some extent, and found useful for organizing material and for retrieving it.


Matching document IDs

['641', '676', '636', '814', '674']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

How can access words in an information retrieval system be kept up to date? Word meanings and usage often change and lists must be dynamic to be current. What definitions of the problem and progress toward solutions have been made in providing necessary flexibility in systems of subject headings, index words, or other symbols used for getting at stored data?


Matching document IDs

['1215', '1091', '329', '567', '807', '363', '1366', '68', '310', '523', '661', '1417', '594', '835', '512', '650', '571', '504', '160', '862', '1144', '533', '450', '1415', '167', '44', '527', '1077', '140', '1328', '758', '702', '129', '85', '728', '193', '1277', '373', '809', '502', '1136', '461', '307', '1418', '1369', '1241', '158', '343', '123', '674', '564', '993', '472', '378', '395', '448', '233', '738', '568', '1090', '78', '465', '737', '1001', '186', '703', '976', '115', '1042', '669', '919', '894', '1454', '346', '958', '204', '156', '879', '345', '1377', '552', '400', '376', '125', '507', '611', '330', '841', '757', '409', '1444', '1368', '1174', '905', '290', '516', '1396', '42', '586', '528', '1254', '977', '428', '1399', '1433', '439', '1000', '148', '202', '137', '521', '541', '73', '851', '107', '888', '486', '1039', '720', '442', '1009', '637', '207', '1432', '733', '295', '579', '441', '813', '408', '635', '1350', '873', '778', '727', '185', '987', '742', '496', '8

Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

The progress of information retrieval presents problems of maladjustment and dislocation of personnel.  Training and retraining of people to use the new equipment is important at all levels.  Librarians, assistants, technicians, students, researchers, and even executives will need education to learn the purpose, values, and uses of information systems and hardware. What programs have been developed to change the attitudes and skills of traditional workers and help them to learn the newer techniques?


Matching document IDs

['612', '388', '692', '327', '274', '136', '140', '373', '575', '907', '1051', '593', '821', '357', '409', '291', '408', '248', '945', '5', '598', '795', '556', '341', '579', '174', '479', '1130', '1362', '714', '335', '126', '504', '577', '120', '630', '646', '1090', '151', '1317', '1263', '1047', '298', '1144', '133', '592', '370', '1183', '496', '1274', '636', '637', '1242', '273', '711', '798', '643', '606', '132', '1421', '671', '511', '360', '982', '1405', '1361', '278', '359', '121', '1027', '1215', '780', '1207', '642', '660', '30', '1414', '1450', '1416', '446', '94', '967', '424', '361', '1139', '1398', '830', '302', '426', '90', '489', '627', '474', '422', '1092', '835', '824', '922', '495', '563', '737', '571', '801', '561', '493', '1075', '487', '616', '841', '1313', '279', '1126', '1315', '15', '633', '587', '676', '1124', '500', '754', '709', '1127', '466', '731', '684', '810', '959', '1064', '682', '1234', '1319', '1091', '315', '936', '678', '1353', '317', '962', '620'

Precision: 0.006

Recall: 0.056

F1-Score: 0.011<br><br>

Query

What is the status of machine translation?  What progress has been made in the use of computers to transfer from one language to another with some degree of automation?  What problems and stumbling blocks have been found and are they considered to be insurmountable limitations or only challenging to the field of documentation on an international scale?


Matching document IDs

['769', '360', '490', '517', '317', '124', '601', '341', '610', '757', '1031', '482', '483', '682', '1274', '494', '417', '43', '486', '347', '475', '1048', '1098']


Precision: 0.000

Recall: 0.056

F1-Score: 0.000<br><br>

Query

Is alphabetical ordering of material considered to be a useful tool in information retrieval?  What studies have been done to compare the effectiveness of alphabetical order with other organization schemes? Is there a generally accepted form of arranging material in alphabetical order, and is there an easy way of achieving this form without going to a great amount of effort?


Matching document IDs

['62', '351', '93', '962', '353', '520', '132', '1333', '500', '1423', '912', '1372', '1359', '391', '1430', '1408', '1094', '17', '317', '594', '460', '174', '470', '505', '1251', '1277', '994', '1005', '448', '1072', '947', '624', '133', '126', '954', '206', '821', '286', '1449', '91', '6', '921', '463', '94', '889', '496', '154', '908', '1441', '932', '1337', '1079', '1455', '668', '755', '970', '497', '1145', '640', '1379', '948', '347', '1012', '1069', '814', '1429', '416', '977', '572', '250', '885', '1144', '438', '1273', '141', '90', '384', '957', '211', '280', '235', '1017', '558', '884', '1420', '1328', '1016', '409', '343']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

The average student or researcher has difficulty in comprehending the vocabulary of information retrieval.  It appears important that this new field be understood before it is to be fully accepted.  What basic articles would provide an understanding of the various important aspects of the information storage and retrieval?


Matching document IDs

['459', '254', '1179', '575', '123', '461', '460', '630', '1405', '472', '68', '131', '1158', '1259', '591', '1364', '655', '889', '1323', '486', '156', '30', '1448', '664', '606', '462', '611', '375', '839', '160', '454', '446', '644', '497', '176', '114', '120', '780', '590', '1164', '501', '259', '1162', '528', '363', '547', '1248', '458', '538', '252', '135', '471', '525', '174', '376', '490', '496', '728', '1362', '592', '197', '761', '175', '1305', '1130', '126', '381', '670', '641', '1190', '838', '309', '1009', '378', '1125', '1170', '481', '478', '334', '257', '595', '600', '1027', '579', '1180', '1124', '137', '598', '566', '1171', '1282', '1136', '29', '327', '707', '857', '755', '1078', '66', '898', '790', '1191', '1081', '323', '452', '594', '488', '773', '434', '1413', '648', '827', '1255', '539', '660', '28', '67', '966', '686', '1092', '619', '151', '1054', '1053', '617', '807', '429', '565', '636', '703', '1153', '1120', '523', '820', '1264', '986', '688', '321', '690'

Precision: 0.072

Recall: 0.400

F1-Score: 0.122<br><br>

Query

The difficulties encountered in information retrieval systems are often less related to the equipment used than to the failure to plan adequately for document analysis, indexing, and machine coding.  The position of the programmer is to take a problem and write it in a way in which the equipment will understand.  What articles have been written describing research in maximizing the effectiveness of programming?


Matching document IDs

['419', '1328', '477', '561', '637', '291', '350', '515', '690', '432', '1427', '459', '593', '822', '454', '175', '883', '179', '428', '144', '1039', '326', '317', '396']


Precision: 0.125

Recall: 0.231

F1-Score: 0.162<br><br>

Query

There are presently fifty to one hundred technical journals being published.  On the average, two new journals appear every day.  In the many journals published, one to two million articles appear every year.  What attempts have been made to cope with this amount of scientific and technical publication in terms of analysis, control, storage, and retrieval?


Matching document IDs

['686', '986', '560', '243', '44', '429', '898', '1091', '664', '515', '197', '889', '135', '381', '1158', '478', '151', '199', '454', '755', '327', '592', '156', '820', '174', '1255', '838', '805', '1171', '608', '1362', '496', '376', '1197', '388', '680', '894', '497', '603', '1377', '486', '1392', '446', '68', '707', '1072', '798', '1191', '619', '636', '1081', '617', '600', '655', '637', '472', '1009', '807', '731', '817', '769', '323', '460', '1248', '82', '159', '61', '1364', '28', '661', '26', '737', '257', '620', '1419', '471', '727', '516', '126', '634', '321', '448', '463', '627', '826', '1175', '1327', '160', '129', '461', '596', '487', '538', '531', '993', '575', '567', '1054', '1422', '762', '1190', '688', '434', '67', '123', '790', '1164', '871', '1162', '606', '73', '378', '728', '120', '329', '334', '458', '319', '660', '630', '716', '594', '30', '1305', '502', '375', '702', '810', '641', '176', '925', '165', '309', '525', '615', '591', '484', '813', '508', '445', '267'

Precision: 0.079

Recall: 0.142

F1-Score: 0.101<br><br>

Query

I am looking for information about the impact of automation on libraries and its significance for libraries in general.  This includes the increasing importance of automation in view of the proliferation of information today, and how automation can help libraries cope with this problem.  How will automation affect libraries and how should they react to the idea of automation?


Matching document IDs

['178', '141', '177', '281', '916', '1012', '11', '136', '406', '970', '1042', '849', '990', '17', '917', '135', '1280', '287', '1212', '6', '1193', '66', '409', '376', '284', '865', '875', '180', '1419', '299', '517']


Precision: 0.258

Recall: 0.104

F1-Score: 0.148<br><br>


Query ID: 46
Query: I am seeking information on the use of data processing in libraries and the mechanization of routine library processes and procedures.  I would like descriptions of both general and specific applications of automation in such areas as circulation, cataloging, acquisitions, serial records, and other record-keeping.  Examples should be based on the operation of a conventional public or university library, or practices in a special library which could also be applied in a public or university library.  Give descriptions of equipment and operations, both present and projected.
No matches found.



Query

Is there any established means at present for an international exchange of material about information retrieval?  If there is, does it take the form of an international agency or center which regularly distributes information retrieval methods and research results?  If there is not, in what ways has this material crossed national boundaries?  What seem to have been some of the problems blocking a better international exchange, and is any effort being made to solve some of those problems?


Matching document IDs

['1256', '1245', '12', '1441', '796', '1153', '1004', '323', '664', '1000', '769', '760', '634', '1189', '1031', '1431', '351', '1052', '594', '148', '297', '1423', '888', '1021', '98', '525', '889', '497', '710', '575', '264', '541', '606', '1136', '773', '473', '496', '1025', '595', '353', '1041', '394', '1323', '725', '1134', '228', '481', '17', '432', '1201', '1042', '97', '1014', '1248', '43', '1429', '1440', '234', '600', '37', '1419', '907', '703', '331', '345', '321', '1013', '1117', '1053', '1081', '451', '1079', '946', '122', '621', '93', '516', '471', '1398', '1351', '992', '126', '370', '441', '500', '1408', '730', '197', '518', '1215', '528', '1390', '515', '587', '1414', '235', '639', '1160', '948', '839', '1106', '424', '779', '1422', '437', '338', '736', '158', '244', '1328', '1454', '851', '426', '254', '616', '950', '1240', '1082', '314', '160', '114', '958', '561', '1421', '11', '1309', '523', '826', '1070', '157', '1142', '767', '628', '123', '970', '993', '1448', '

Precision: 0.000

Recall: 0.104

F1-Score: 0.000<br><br>

Query

Information retrieval is still such a new and experimental field that a line distinguishing research and practice is often difficult - even impossible - to draw.  Are there, however, actual centers of research on information retrieval?  If so, in which countries are they located?  Who supports them - government, business, universities, or libraries?  Can information retrieval as a specialized research discipline be said to be emerging, or is it still an amalgam of skills from other fields, such as mathematics, engineering, and library science?  In other words, tell me about information retrieval research.


Matching document IDs

['462', '375', '1120', '966', '257', '1219', '1009', '703', '388', '148', '160', '151', '1150', '1183', '381', '769', '1323', '1242', '1264', '163', '479', '905', '174', '512', '17', '1432', '943', '871', '807', '1179', '1328', '116', '619', '328', '946', '259', '1408', '1151', '728', '391', '88', '1197', '378', '575', '260', '1424', '1203', '1170', '1347', '311', '1130', '350', '338', '504', '254', '600', '176', '107', '1442', '690', '889', '526', '474', '385', '606', '982', '243', '1382', '438', '314', '132', '450', '1348', '1454', '341', '688', '199', '949', '915', '96', '439', '813', '821', '556', '1011', '436', '828', '129', '348', '1099', '796', '304', '1146', '964', '351', '1346', '1205', '1315', '1020', '1308', '649', '1023', '1209', '666', '1257', '626', '357', '484', '1178', '1378', '481', '1455', '642', '1403', '4', '1401', '1262', '1372', '559', '1418', '1207', '343', '1425', '985', '961', '1149', '603', '941', '896', '1250', '475', '612', '1095', '95', '353', '1273', '818'

Precision: 0.000

Recall: 0.104

F1-Score: 0.000<br><br>

Query

Most resources have been spent on applying information retrieval techniques to the physical and medical sciences.  But, has information retrieval been used at all in the natural sciences, social sciences, and humanities?  If so, what have been some of the problems which have been encountered with these subject areas and how have they been solved, if at all?  Have the characteristics of these subject areas necessitated the development of new information retrieval techniques? What are the prospcts for future machine control in these areas?


Matching document IDs

['1082', '807', '174', '545', '1362', '575', '151', '123', '1368', '185', '866', '136', '547', '1273', '898', '1147', '268', '664', '1345', '817', '157', '140', '577', '1067', '1202', '1144', '1392', '555', '596', '1342', '602', '885', '1346', '129', '1348', '899', '202', '1140', '505', '609', '533', '1263', '1432', '816', '624', '484', '105', '132', '626', '143', '17', '977', '177', '784', '1270', '403', '573', '96', '1450', '138', '1409', '358', '1216', '989', '544', '888', '635', '339', '802', '946', '134', '614', '821', '546', '858', '1332', '23', '425', '1076', '863', '1056', '1213', '178', '1205', '673', '906', '1219', '16', '366', '861', '963', '119', '1206', '1135', '811', '1014', '654', '296', '696', '988', '80', '436', '1369', '1097', '828', '1221', '667', '194', '856', '84', '1104', '1095', '12', '301', '392', '1373', '1352', '854', '1048', '1105', '440', '921', '1203', '1336', '1184', '1445']


Precision: 0.056

Recall: 0.206

F1-Score: 0.088<br><br>

Query

Is there any use for traditional classification schemes - DDC, UDC, LC, etc. - in information retrieval systems?  If there is, which scheme appears most suited to machine use and where has it been applied? If there is not, why are these classification schemes irrelevant? Has research shown that a subject classification of knowledge is completely unnecessary in machine systems? Or, have new schemes been devised which appear to be more suited to machine use?


Matching document IDs

['388', '261', '1430', '1442', '485', '746', '263', '1259', '354', '620', '1144', '16', '798', '801', '1419', '377', '488', '503', '262', '260', '564', '1372', '664', '54', '479', '1356', '809', '663', '200', '1072', '176', '874', '290', '687', '1044', '404', '9', '275', '1421', '323', '47', '936', '797', '380', '947', '728', '1404', '197', '321', '889', '1173', '53', '648', '509', '1264', '58', '459', '732', '490', '990', '670', '447', '1407', '688', '766', '985', '190', '1009', '1411', '1392', '1360', '526', '621', '807', '704', '132', '1183', '762', '654', '1415', '327', '699', '805', '675', '558', '501', '615', '475', '26', '140', '508', '789', '774', '370', '645', '534', '30', '981', '839', '773', '866', '957', '278', '1277', '832', '847', '1251', '1273', '1410', '449', '545', '1020', '71', '1365', '25', '916', '556', '115', '946', '502', '155', '536', '722', '962', '611', '788', '1050', '604', '78', '808', '614', '126', '204', '348', '484', '610', '1328', '1449', '535', '1146', '

Precision: 0.068

Recall: 0.258

F1-Score: 0.107<br><br>

Query

Coordinate indexing utilizes descriptors for controlled language.  Of what use are descriptors in the construction of an index?  How can descriptors be used for searching in an information retrieval system?


Matching document IDs

['1124', '1175', '1139', '1171', '1120', '478', '151', '1448', '1073', '434', '664', '389', '175', '510', '830', '390', '1230', '731', '1024', '179', '388', '595', '796', '1277', '687', '468', '321', '641', '1413', '660', '212', '779', '966', '1144', '566', '591', '458', '538', '446', '620', '159', '530', '866', '642', '556', '989', '1283', '889', '257', '378', '1326', '590', '1010', '762', '798', '825', '565', '648', '1261', '603', '291', '703', '702', '72', '1264', '71', '445', '514', '190', '637', '1180', '483', '611', '1419', '129', '49', '318', '522', '254', '1136', '682', '482', '704', '1092', '1054', '419', '523', '1163', '461', '517', '773', '262', '477', '197', '547', '801', '1091', '1012', '531', '508', '572', '606', '57', '1298', '997', '472', '501', '459', '741', '615', '644', '621', '740', '1126', '645', '44', '582', '593', '742', '504', '809', '224', '526', '1077', '309', '497', '701', '542', '27', '562', '61', '535', '67', '117', '174', '376', '336', '690', '1127', '1366

Precision: 0.000

Recall: 0.258

F1-Score: 0.000<br><br>

Query

What are the characteristics of MEDLARS (Medical Literature Analysis and Retrieval System) project which has been undertaken by the National Library of Medicine?  How does it index current medical journals and of what relation is this indexing system to Index Medicus? What are the major components of the MEDLARS project and its major operating details?


Matching document IDs

['830', '219', '526', '202', '817', '465', '534', '888', '208', '423', '1431', '1317', '134', '158', '115', '649', '515', '342', '406', '1111', '397', '506', '725', '1364', '1276', '315', '882', '900', '898', '755', '428', '539', '1427', '706', '816', '385', '596', '1450', '244', '495', '1229', '956', '1035', '324', '783', '918', '1212', '994', '940', '1312', '1204', '1351', '84', '1420', '417', '247']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

How can the computer be used in medical science for diagnostic and clinical record keeping purposes?  Have any programs of automation been tried in hospitals?  If so, what have been the results? What problems have been encountered in the use of automation in medicine?  For what purposes can an automated system of clinical records be used?  What are other possible uses of the computer in medicine?


Matching document IDs

['958', '1147', '1249', '696', '181', '594', '190', '10', '75', '220', '211', '194', '1055', '986', '891', '624', '200', '72', '1050', '1303', '382', '133', '883', '1397', '547', '452', '1188']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

What is the effect on librarians of automation?  Note the new types of technology to be used in the library which will have an effect on the status, position, and function of the librarians.  What changes are being contemplated or have been initiated to introduce automation into the education of librarians?


Matching document IDs

['283', '1212', '141', '917', '816', '392', '167', '1457', '22', '1042', '17', '187', '994', '547', '1240', '1318', '1012', '178', '323', '177', '1264', '6', '409', '410', '896', '408', '334', '1417', '181', '842', '949', '1253', '285', '264', '1182', '1192', '248', '1450', '46', '515', '961', '415', '257', '952', '1237', '188', '1325', '914', '268', '1403', '941', '1090', '260', '1400', '592', '496', '913', '325', '946', '393', '1246', '756', '31', '1020', '552', '918', '90', '927', '213', '976', '985', '841', '975', '1349', '274', '909', '954', '1239', '7', '839', '405', '1249', '933', '881', '942', '1373', '1324', '221', '593', '1245', '1263', '1242', '910', '1008', '1198', '583', '1441', '240', '1404', '242', '818', '811', '1049', '1322', '1317', '302', '925', '291', '1247', '166', '1014', '1268', '282', '237', '1248', '768', '950', '196', '414', '1365', '1022', '373', '163', '273', '1035', '898', '337', '1356', '238', '926', '974', '1028', '307', '275', '244', '20', '821', '1371',

Precision: 0.148

Recall: 0.471

F1-Score: 0.225<br><br>

Query

What are the aims and objectives of the medical literature analysis and retrieval system (MEDLARS)?  How does MEDLARS operate?  What are the possible applications of MEDLARS to future information retrieval systems?


Matching document IDs

['382', '986', '883', '452', '806', '1051', '526', '72', '603', '446', '67', '481', '591', '1016', '135', '1241', '547', '325', '123', '136', '158', '482', '889', '1179', '514', '630', '659', '114', '525', '1249', '1143', '1121', '128', '1038', '716', '606', '408', '595', '57', '202', '1171', '637', '459', '826', '1136', '228', '1007', '434', '244', '497', '530', '483', '1035', '780', '621', '254', '565', '1114', '615', '809', '175', '703', '478', '579', '165', '515', '1092', '213', '1264', '865', '1255', '376', '1418', '208', '670', '348', '737', '993', '454', '1120', '409', '458', '197', '538', '179', '1081', '617', '27', '611', '1106', '645', '815', '445', '1077', '671', '73', '1341', '309', '1098', '159', '1170', '267', '801', '74', '4', '1401', '1173', '317', '1124', '795', '1125', '1437', '1360', '1410', '1104', '731', '71', '17', '501', '119', '773', '874', '1139', '528', '827', '648', '866', '375', '1109', '916', '490', '134', '28', '1078', '474', '319', '494', '519', '1350', '

Precision: 0.018

Recall: 0.500

F1-Score: 0.034<br><br>

Query

The standard method of finding information in today's libraries is through the use of the alphabetically arranged card catalog or the classified catalog based on a classification system such as the DC or LC.  Can these systems be modified for use with automated information retrieval?


Matching document IDs

['655', '502', '1448', '534', '798', '1124', '472', '1196', '1419', '1197', '606', '334', '1391', '530', '257', '388', '1139', '459', '176', '615', '1125', '611', '445', '866', '434', '67', '1053', '1170', '501', '1264', '839', '648', '259', '986', '1120', '1027', '123', '526', '78', '490', '704', '488', '461', '889', '565', '497', '523', '175', '575', '375', '478', '126', '660', '135', '1158', '484', '376', '254', '966', '670', '664', '179', '66', '1092', '827', '993', '1136', '925', '458', '327', '538', '508', '644', '174', '319', '826', '630', '709', '627', '321', '707', '728', '120', '690', '1191', '378', '737', '1179', '1171', '1190', '515', '446', '617', '562', '267', '591', '509', '525', '137', '1078', '451', '474', '773', '481', '1009', '1072', '1259', '594', '309', '363', '620', '539', '323', '512', '448', '1362', '454', '73', '1405', '625', '243', '883', '486', '797', '28', '462', '129', '895', '659', '381', '703', '452', '687', '1126', '762', '528', '637', '114', '1413', '25

Precision: 0.047

Recall: 0.289

F1-Score: 0.080<br><br>

Query

In catalogs which are either arranged alphabetically or arranged by classification number, the LC entry, printed in readable language, is ultimately important because the individual looking for information has a definite author, title, or subject phrase in his language (probably English in our case) in mind.  Will LC entries and subject headings be used in the same manner in automated systems?


Matching document IDs

['1230', '848', '990', '1366', '1215', '1197', '1024', '212', '317', '1091', '970', '1196', '1179', '1395', '1415', '874', '838', '796', '354', '866', '1261', '998', '802', '174', '168', '798', '825', '1139', '1124', '1152', '1298', '820', '461', '64', '1000', '16', '572', '562', '1326', '941', '44', '1170', '809', '257', '639', '231', '702', '1180', '1175', '159', '884', '530', '434', '320', '389', '522', '682', '620', '477', '664', '582', '504', '49', '601', '868', '478', '843', '151', '739', '676', '556', '1402', '575', '999', '576', '648', '1436', '1120', '1171', '1448', '373', '1281', '898', '228', '801', '1427', '472', '609', '244', '502', '180', '390', '865', '150', '213', '947', '447', '1099', '262', '697', '136', '158', '483', '542', '1419', '1267', '795', '245', '666', '445', '993', '895', '482', '641', '726', '336', '1277', '1328', '59', '1110', '1349', '476', '179', '396', '1136', '860', '523', '119', '380', '340', '564', '546', '459', '652', '849', '175', '1077', '862', '6

Precision: 0.019

Recall: 0.556

F1-Score: 0.038<br><br>

Query

Directions in Library Networking Bibliographic control before and after MARC is reviewed.  The capability of keying into online systems brought an interdependence among libraries, the service centers that mediate between them, and the large utilities that process and distribute data.  From this has developed the basic network structure among libraries in the United States.  The independent development of major networks has brought problems in standardization and coordination. The authors point out that while technology has led toward centralization of automated library services, new developments are now pushing toward decentralization.  Coordination is a requirement to avoid fragmentation in this new environment.


Matching document IDs

['528', '960', '1379', '1058', '642', '202', '1375', '387', '964', '244', '497', '688', '646', '357', '1456', '325', '390', '484', '612', '142', '129', '1207', '608', '490', '1426', '67', '1359', '517', '1186', '968', '1065', '1321', '518', '1029', '805', '309', '1202', '350', '628', '1167', '531', '1025', '772']


Precision: 0.023

Recall: 0.022

F1-Score: 0.022<br><br>

Query

Performance Testing of a Book and Its Index as a Information Retrieval System The retrieval performance of book indexes can be measured in terms of their ability to direct a user selectively to text material whose identity but not location is known.  The method requires human searchers to base their searching strategies on actual passages from the book rather than on test queries, natural or contrived.  It circumvents the need for relevance judgement, but still yields performance indicators that correspond approximately to the recall and precision ratios of large document retrieval system evaluation.  A preliminary application of the method to the subject indexing of two major encyclopedias showed one encyclopedia apparently superior in both the finding and discrimination abilities of retrieval performance.  The method is presently best suited for comparative testing since its ability to yield absolute or reproducible measures is as yet not established.


Matching document IDs

['1175', '1419', '530', '207', '357', '874', '52', '1040', '383', '9', '148', '165', '1215', '136', '1066', '642', '352', '1117', '240', '505', '587', '358', '1201', '831', '217', '432', '1304', '1256', '1432', '1235', '850', '85', '1128', '767', '37', '268', '879', '723', '1216', '1436', '1011', '410', '1352', '996', '455', '171', '881', '1343']


Precision: 0.000

Recall: 0.022

F1-Score: 0.000<br><br>

Query

The Combined Use of Bibliographic Coupling and Cocitation for Document Retrieval A linkage similarity measure which takes into account both the bibliographic coupling of documents and their cocitations (both cited and citing papers) produced improved document retrieval over a measure based only on bibliographic coupling.  The test collection consisted of 1712 papers whose relevance to specific queries had been judged by users.  To evaluate the effect of using cocitation data, we calculated for each query two measures of similarity between each relevant paper and every other paper retrieved. Papers were then sorted by the similarity measures, producing two ordered lists.  We then compared the resulting predictions of relevance, partial relevance, and non-relevance to the user's evaluations of the same papers. Overall, the change from the bibliographic coupling measure to the linkage similarity measure, representing the introduction of cocitation data, resulted in better retrieval perfor

Matching document IDs

['523', '492', '528', '805', '806', '61', '79', '956', '590', '962', '74', '448', '894', '28', '575', '1195', '731', '813', '625', '754', '737', '319', '820', '522', '296', '755', '807', '812', '295', '955', '615', '194', '702', '393', '785', '895', '488', '470', '1358', '581', '71', '596', '1099', '531', '176', '481', '1078', '595', '1062', '720', '141', '726', '593', '609', '390', '302', '1170', '586', '685', '162', '1039', '525', '845', '297', '483', '1038', '1230', '484', '1321', '382', '75', '1070', '149', '288', '1448', '458', '538', '1445', '137', '620', '1337', '841', '591', '1175', '630', '641', '897', '1410', '250', '244', '1040', '639', '1186', '1263', '855', '1424', '417', '1214', '389', '152']


Precision: 0.000

Recall: 0.022

F1-Score: 0.000<br><br>

Query

Searching Biases in Large Interactive Document Retrieval Systems The way that individuals construct and modify search queries on a large interactive document retrieval system is subject to systematic biases similar to those that have been demonstrated in experiments on judgements under uncertainty.  These biases are shared by both naive and sophisticated subjects and cause the inquirer searching for documents on a large interactive system to construct and modify queries inefficiently.  A searching algorithm is suggested that helps the inquirer to avoid the effect of these biases.


Matching document IDs

[]


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Fuzzy Requests:  An Approach to Weighted Boolean Searches This article concerns the problem of how to permit a patron to represent the relative importance of various index terms in a Boolean request while retaining the desirable properties of a Boolean system. The character of classical Boolean systems is reviewed and related to the notion of fuzzy sets.  The fuzzy set concept then forms the basis of the concept of a fuzzy request in which weights are assigned to index terms. Ther properties of such a system are discussed, and it is shown that such systems retain the manipulability of traditional Boolean requests.


Matching document IDs

['773', '660', '1363', '562', '706', '488', '1138', '58', '1103', '483', '479', '596', '597', '1124', '381', '644', '862', '871', '394', '1155', '484', '711', '1156', '451', '165', '207', '1130', '1078', '1120', '309', '552', '1126', '1055', '1170', '797', '551', '218', '10', '695', '1327', '1118', '223', '1352', '1158', '898', '339', '526', '788', '71', '382', '783', '891', '1147', '1394', '550', '892', '271', '1231', '188']


Precision: 0.017

Recall: 0.083

F1-Score: 0.028<br><br>

Query

Feature Comparison of an In-House Information Retrieval System With a Commercial Search Service A commercially available online search was used as a standard for comparative searching and evaluation of an in-house information system based on automatic indexing.  System features were identified and evaluated on the basis of their usefulness in various kinds of searching, their ease in implementation, and how they are influenced by differences in user type or specific applications.  Some common features of the commercial system, such as online instruction, user-specified print formats, dictionary display, and truncation, are seen to be unnecessary or impractical for the in-house system.  In designing the in-house system, therefore, detald consideration must be given to the applications, operating environment, and real user needs.  While a commercial system can serve as a useful standard for comparative evaluation, one must be careful not to attempt to duplicate it blindly in-house.


Matching document IDs

[]


Precision: 0.000

Recall: 0.083

F1-Score: 0.000<br><br>

Query

Measurement in Information Science:  Objective and Subjective Metrical Space It is argued that in information science we have to distinguish physical, objective, or document space from perspective, subjective, or information space.  These two spaces are like maps and landscapes: each is a systematic distortion of the other.  However, transformation can be easily made once the two spaces are distinguished.  If the transformations are omitted we only get unhelpful physical solutions to information problems.


Matching document IDs

['944', '1014', '943', '541', '988', '393', '1419', '321', '634', '1147', '373', '525', '265', '1309', '279', '308', '518', '497', '1387', '958', '818', '1429', '993', '137', '1342', '664', '473', '587', '582', '688', '568', '314', '338', '950', '515', '437', '1153', '1253', '1070', '1149', '1037', '616', '128', '560', '1337', '244', '628', '1433', '93', '606', '394', '1444', '649', '1178', '335', '553', '1106', '769', '1422', '992', '1171', '349', '1160', '1181', '1448', '1142', '661', '1215', '248', '1351', '1245', '1093', '796', '228', '457', '585', '826', '123', '496', '1049', '1298', '17', '345', '1058', '1406', '223', '185', '163', '422', '724', '408', '652', '1008', '1340', '916', '583', '889', '1161', '621', '1441', '1021', '1098', '1416', '575', '25', '113', '166', '1173', '43', '1052', '523', '1151', '331', '350', '1427', '42', '1031', '315', '471', '160', '1044', '1202', '1440', '234', '1248', '342', '323', '258', '516', '1388', '500', '340', '1454', '1315', '148', '413', '1

Precision: 0.000

Recall: 0.083

F1-Score: 0.000<br><br>

Query

A Model of Cluster Searching Based on Classification The use of document clusters has been suggested as an efficient file organization for a document retrieval system.  It is possible that by using this information about the relationships between documents that the effectiveness of the system (i.e., its ability to distinguish relevant from non-relevant documents) may also be improved.  In this paper a probabilistic model of cluster searching  based on query classification is described.  This model is tested with retrieval experiments which indicate that it can be more effective than heuristic cluster searches and cluster searches based on other models.  It can also be more effective than a full search in which every document is compared to the query.  The efficiency aspects of the implementation of the model are discussed.


Matching document IDs

['575', '577', '310', '661', '483', '446', '515', '631', '715', '68', '826', '722', '1170', '704', '1164', '1162', '1134', '959', '498', '1120', '630', '703', '58', '429', '1343', '84', '1298', '654', '450', '1448', '42', '963', '625', '426', '872', '466', '561', '502', '721', '773', '591', '1144', '69', '1136', '194', '67', '799', '678', '582', '1171', '962', '1417', '590', '1309', '725', '706', '307', '853', '798', '47', '699', '726', '647', '571', '423', '534', '808', '1057', '639', '528', '461', '1368', '493', '1116', '758', '539', '500', '890', '1367', '294', '481', '1227', '482', '66', '673', '465', '315', '467', '710', '640', '6', '780', '839', '452', '105', '516', '64', '587', '822', '1008', '1075', '278', '107', '690', '1179', '599', '532', '17', '638', '807', '597', '1254', '990', '713', '685', '352', '958', '41', '1353', '130', '456', '40', '222', '155', '1321', '774', '1334', '1001', '1346', '720', '1201', '836', '277', '848', '789', '818', '1014', '1090', '574', '694', '78

Precision: 0.006

Recall: 0.077

F1-Score: 0.011<br><br>

Query

The Technology of Library and Information Networks Current online library network technology is described, including the physical and functional aspects of networks.  Three types of networks are distinguished:  search service (e.g., SDC, Lockheed), customized service that provide bibliographic files (e.g., OCLC, Inc., RLIN), and service center (e.g., NELINET, INCOLSA).  It is predicted that as technology evolves more services will be provided outside the library directly to the user through his home or office.


Matching document IDs

['375', '376', '619', '497', '979', '918', '935', '400', '970', '298', '304', '249', '964', '1365', '406', '181', '22', '1033', '331', '151', '1347', '172', '1072', '556', '1174', '837']


Precision: 0.115

Recall: 0.086

F1-Score: 0.098<br><br>

Query

The Use of Titles for Automatic Document Classification An experimental computer program has been developed to classify documents according to the 80 sections and five major section groupings of Chemical Abstracts (CA).  The program uses pattern recognition techniques supplemented by heuristics.  During the "training" phase, words from pre-classified documents are selected, and the probability of occurrence of each word in each section of CA is computed and stored in a reference dictionary.  The "classification" phase matches each word of a document title against the dictionary and assigns a section number to the document using weights derived from the probabilities in the dictionary.  Heuristic techniques are used to normalize word variants such as plurals, past tenses, and gerunds in both the training phase and the classification phase.  The dictionary lookup technique is supplemented by the analysis of chemical nomenclature terms into their component word roots to influence the sect

Matching document IDs

['71', '1248', '589', '388', '1415', '447', '1091', '898', '409', '1024', '1366', '1342', '1194', '617', '849', '133', '1416', '588', '861', '1432', '1259', '1340', '476', '1047', '400', '1072', '803', '470', '1272', '1314', '1346', '909', '417', '634', '549', '96', '953', '222', '1065', '984', '774', '1079', '902', '407', '290', '239', '1031', '1005', '1351', '12', '1198', '456', '284', '463', '599', '265', '331', '130', '1318']


Precision: 0.017

Recall: 0.031

F1-Score: 0.022<br><br>

Query

Brief Communications Some of the automatic classification procedures used in information retrieval derive clusters of documents from an intermediate similarity matrix, the computation of which involves comparing each of the documents in the collection with all of the others.  It has recently been suggested that many of these comparisons, specifically those between documents having no terms in common, may be avoided by means of the uyse of an inverted file to the document collection.  This communication shows that the approach will effect reductions in the number of interdocument comparisons only if the documents are each indexed by a limited number of indexing terms; if exhaustive indexing is used, many document pairs will be compared several times over and the computation will be greater than when conventional approaches are used to generate the similarity matrix.


Matching document IDs

['570', '328', '1120', '277', '362', '679', '1143']


Precision: 0.000

Recall: 0.031

F1-Score: 0.000<br><br>

Query

The Application of a Minicomputer to Thesaurus Construction The Use of a minicomputer in various phases of creating the thesaurus for the National Information Center for Special Education Materials (NICSEM) database is described.  The minicomputer is used to collect, edit, and correct candidate thesaurus terms.  The use of the minicomputer eases the process of grouping terms into files of similar concepts and facilitates the generation of products useful in vocabulary review and in term structuring.  Syndetic relations, indicated by assigning coded identification numbers, are altered easily in the design phase to reflect restructuring requirements.  Because thesaurus terms are already in machine- readable form, it is simple to prepare print programs to provide permuted, alphabetic, hierarchical, and chart formatted term displays.  Overall, the use of the minicomputer facilitates initial thesaurus entry development by reducing clerical effort, editorial staff decisions, and overall proc

Matching document IDs

['511', '1421', '506', '1076', '1325', '1415', '868', '1366', '378', '581', '825', '174', '593', '1408', '947', '446', '1363', '329', '811', '860', '158', '136', '223', '409', '86', '826', '1449', '842', '141', '1369', '80', '92', '1090', '727', '788', '1328', '140', '620', '728', '945', '5', '47', '764', '571', '371', '959', '278', '1416', '1396', '131', '657', '1058', '1079', '207', '391', '1339', '1372', '145', '1404', '885', '1253', '841', '1371', '1277', '89', '1183', '1248', '743', '27', '165', '963', '1096', '222', '1128', '311', '1417', '474', '1373', '1207', '978', '495', '64', '1454', '938', '470', '1052', '476', '1206', '832', '436', '561', '313', '1365', '459', '1359', '815', '934', '1039', '413', '839', '1229', '1419', '438', '427', '517', '1386', '325', '138', '582', '173', '124', '899', '107', '154', '1002', '193', '417', '1330', '395', '908', '473', '20', '228', '202', '490', '439', '1457', '496', '1240', '1051', '150', '966', '1108', '1082', '453', '1099', '686', '291'

Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Adaptive Design for Decision Support Systems Decision Support Systems (DSS) represent a concept of the role of computers within the decision making process.  The term has become a rallying cry for researchers, practitioners, and managers concerned that Management Science and Management Information Systems fields have become unnecessarily narrow in focus.  As with many rallying cries, the term is not well defined.  For some writers, DSS simply mean interactive systems for use by managers.  To others, the key issue is support, rather than system.  They focus on understanding and improving the decision process; a DSS is then designed using any available and suitable technology. Some researchers view DSS as a subfield of MIS, while others regard it as an extension of Management Science techniques.  The former define Decision Support as providing managers with access to data and the latter as giving them access to analytic models.  The key argument of this paper is that the term DSS is rele

Matching document IDs

['359', '164', '973', '1069', '1434']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

An Automatic Method for Extracting Significant Phrases in Scienfific or Technical Documents A new method is described to extract significant phrases in the title and the abstreact of scientific or technical documents.  The method is based upon a text structure analysis and uses a relatively small dictionary. The dictionary has been constructed based on the knowledge about concepts in the field of science or technology and some lexical knowledge.  For significant phrases and their component items may be used in different meanings among the fields.  A text analysius approach has been applied to select significant phrases as substantial and semantic information carriers of the contents of the abstract.  The results of the experiment for five sets of documents have shown that the significant phrases are effectively extracted in all cases, and the number of them for every document and the processing time is fairly satisfactory.  The information representation of the document, partly using t

Matching document IDs

['1124', '483', '666', '175', '820', '1054', '1281', '448', '565', '419', '136', '179', '1327', '566', '68', '571', '1419', '71', '1190', '564', '582', '317', '309', '1191', '1215', '636', '446', '72', '1460', '641', '486', '1175', '484', '319', '49', '449', '825', '329', '1080', '517', '523', '553', '659', '1144', '522', '1298', '461', '707', '472', '321', '575', '228', '576', '137', '1128', '202', '1391', '726', '17', '447', '131', '1207', '542', '381', '1126', '560', '327', '151', '898', '126', '224', '1127', '132', '421', '982', '865', '664', '320', '1448', '511', '1361', '1136', '373', '1105', '64', '510', '123', '826', '644', '754', '135', '842', '95', '73', '1195', '1366', '660', '1044', '815', '840', '1417', '723', '1092', '478', '572', '621', '310', '1427', '1326', '376', '702', '61', '1121', '1255', '409', '682', '562', '150', '1171', '1109', '1091', '728', '1112', '1098', '642', '1125', '158', '737', '889', '519', '1179', '1416', '544', '1114', '1360', '801', '1120', '513', 

Precision: 0.023

Recall: 0.444

F1-Score: 0.044<br><br>

Query

Answer-Passage Retrieval by Text Searching Passage retrieval (already operational for lawyers) has advantages in output form opver references retrieval and is economically feasible. Previous experiments in passage retrieval for scientists have demonstrated recall and false retrieval rates as good or better than those of present reference retrieval services.  The present experiment involved a greater variety of forms of retrieval question.  In addition, search words were selected independently by two different people for each retrieval question. The search words selected, in combination with the computer procedures used for passage retrieval, produced average recall ratios of 72 and 67%, respectively, for the two selectors.  The false retrieval rates were (except for one predictably difficult question) respectively 13 and 10 falsely retrieved sentences per answer-paper retrieved.


Matching document IDs

['636', '487', '820', '603', '806', '523', '519', '810', '520', '150', '197', '381', '1392', '1197', '429', '477', '868', '876']


Precision: 0.000

Recall: 0.444

F1-Score: 0.000<br><br>

Query

Partial-Match Retrieval Using Indexed Descriptor Files In this paper we describe a practical method of partial-match retrieval in very large data files.  A binary code word, called a descriptor, is associated with each record of the file.  These record descriptors are then used to form a derived descriptor for a block of several records, which will serve as an index for the block as a whole; hence, the name "indexed descriptor files."  First the structure of these files is described and a simple, efficient retrieval algorithm is presented.  Then its expected behavior, in terms of storage accesses, is analyzed in detail.  Two different file creation procedures are sketched, and a number of ways in which the file organization can be "tuned" to a particular application is suggested.


Matching document IDs

['1124', '1108', '530', '812', '61', '429', '600', '755', '27', '495', '174', '490', '1160', '47', '73', '630', '1417', '1381', '445', '748', '802', '745', '625', '16', '433', '42', '79', '781', '1044', '960', '1346', '849', '189', '962', '821', '167', '1135', '627', '910', '219', '466', '1174', '609', '198', '1350', '1098', '808', '1074', '432', '652', '638', '585', '1001', '841', '1181', '628', '1104', '1100', '70', '154', '1163', '1354', '87', '846', '31']


Precision: 0.000

Recall: 0.444

F1-Score: 0.000<br><br>

Query

Cooperation and Competition Among Library Networks Recenty technological advances and the success of OCLC, Inc. has led to the emergence of three additional nonprofit library networks:  the Research Libraries Information Network (RLIN) of the Research Libraries Group, Inc., the University of Toronto Library Automation System (UTLAS), and the Washington Library Network (WLN).  This paper examines the economic and technological factors affecting the evolution of these networks and also explores the role of those state and regional (multistate) networks that broker OCLC services.  The competitive and cooperative nature of network relationships is a major theme of the discussion.


Matching document IDs

['119', '481', '1186', '285', '960', '1248', '1052', '1212', '917', '978', '257', '878', '96', '918', '929', '22', '1068', '902', '6', '484', '246', '1441', '476', '1402', '1111', '575', '610', '546', '953', '1271', '1200', '1057', '89', '1387', '458', '538', '1315', '59', '60', '752', '418', '1309', '716', '471', '1254', '28', '1030', '758', '78', '549', '176', '199', '725', '168', '469', '461', '396', '84', '751', '1044', '154']


Precision: 0.000

Recall: 0.444

F1-Score: 0.000<br><br>

Query

An Integrated Understander A new type of natural language parser is presented.  The idea behind this parser is to map input sentences into the deepest form of the representation of their meaning and inferences, as is appropriate.  The parser is not distinct from an entire understanding system.  It uses an integrated conception of inferences, scripts, plans and other knowledge to aid in the parse.  Furthermore, it does not attempt to parse everything it sees.  Rather, it determines what is most interesting and concentrates on that, ignoring the rest.


Matching document IDs

['393', '1338', '358', '14', '22']


Precision: 0.000

Recall: 0.444

F1-Score: 0.000<br><br>

Query

Library Networks and Resource Sharing in the United States: An Historical and Philosophical Overview This paper discusses the origins of library networks and traces their development in the United States in the late 1960s through the present. The concept of resource sharing, with particular attention to the inter- library loan and programs for the cooperative acquisition and storage of materials, is examined in relationship to library networks.  In particular, attention is given to the question of how these two major components of library cooperation, which have tended to be separate, might become more closely integrated.


Matching document IDs

['884', '849', '1152', '992', '556', '1071', '1171', '1427', '481', '572', '417', '796', '702', '1223', '704', '312', '1153', '709', '628', '993', '438', '513']


Precision: 0.091

Recall: 0.033

F1-Score: 0.049<br><br>

Query

Normalization of Titles and Their Retrieval This paper presents a method of normalizations of English titles and their retrieval.  The title expressed by a noun phrase or a noun clause is converted to a function-expression by parsing.  For the retrieval with a reasonable recall rate as well as a high precision rate, the function- expression is transformed to a predicate-governor form, and then normalized to a standard form.  Therefrom, various items are extracted and recorded in a hierarchical tree-like inverted file.  In order to keep the recall rate in a reasonable value, several retrieval stages are implemented based on the key-term and case-label matching.  The retrieval is controlled by the preciseness of the specification of case-labels for each key-term.


Matching document IDs

['862', '919', '479', '58', '1191', '637', '824', '68', '1190', '1460', '489', '706', '883', '136', '1396', '877', '597']


Precision: 0.000

Recall: 0.033

F1-Score: 0.000<br><br>

Query

Cascaded ATN Grammars A generalization of the notion of ATN grammar, called a cascaded ATN (CATN), is prescribed.  CATN's permit a decomposition of complex language understanding behavior into a sequence of cooperating ATN's with separate domain of responsibility, where each stage (called an ATN transducer) takes its input from the output of the previous stage.  The paper includes an extensive discjussion of the principles of factoring-conceptual factoring reduces the number of places that a given fact needs to be represented in a grammar, and hypothesis factoring reduces the number of distinct hypotheses that have to be considered during parsing.


Matching document IDs

[]


Precision: 0.000

Recall: 0.033

F1-Score: 0.000<br><br>

Query

Algorithms for Processing Partial Match Queries Using Word Fragments Algorithms are given to process partially specified queries in a compressed database system.  The proposed methods handle effectively queries that use either whole words or word fragments as language elements. The methods are compared and critically evaluated in terms of the design and retrieval costs.  The analyses show that the method which exploits the interdependence of fragments as well as the relevance of fragments to records in the file has maximum design cost and least retrieval cost.


Matching document IDs

['1366', '523', '620', '321', '512', '1358', '875', '807', '446', '507', '865', '615', '702', '571', '214', '492', '860', '822', '500', '27', '594', '690', '737', '842', '872', '222', '617', '158', '1252', '1264', '1230', '1417', '962', '294', '299', '408', '466', '813', '811', '17', '490', '704', '792', '629', '591', '510', '250', '497', '979', '675', '295', '218', '520', '639', '468', '465', '984', '1365', '126', '952', '840', '921', '809', '324', '1421', '491', '614', '1450', '192', '74', '1371', '281', '292', '249', '1353', '288', '1410', '279', '305', '974', '255', '380', '831', '400', '364', '1396', '348', '496', '848', '1449', '1305', '495', '1258', '849', '1060', '938', '83', '839', '1363', '623', '884', '1368', '1040', '515', '1376', '167', '1374', '795', '879', '482', '326', '271', '16', '834', '723', '897', '584', '957', '12', '367', '551', '976', '1203', '331', '56', '774', '1151', '724', '80', '1390', '1248', '1071', '939', '189', '307', '950', '457', '353']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

A General Formulation of Bradford's Distribution:  The Graph-Oriented Approach From the detailed analysis of eight previously published mathematical models, a general formulation of Bradford's distribution can be deduced as follows:  y = a log(x + c) + b, where y is the ratio of the cumulative frequency of articles to the total number of articles and x is the ratio of the rank of journals to the total number of journals.  The parameters a, b, and c are the slope, the intercept, and the shift in a straight line to log rank, respectively.  Each of the eight models is a special case of the general formulation and is one of five types of formulation.  In order to estimate three unknown parameters, a statistical method using root-weighted square error is proposed.  A comparative experiment using 11 databases suggests that the fifth type of formulation with three unknown parameters is the best fit to the observed data.  A further experiment shows that the deletion of the droop data leads to 

Matching document IDs

['1196', '587', '409', '581', '380', '492', '1073', '750', '1221', '795', '517', '862', '1033']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Lexical Problems in Large Distributed Information Systems The lexical problems in large information systems are created by the necessity of handling a great number of names and their interrelations. Such lexical problems are not covered completely by the concept data dictionaries, which are mostly concerned with database scheme design rather than the execution of operations.  In this paper we introduce our view of a lexical subsystem as a separate component in an information system architecture, to deal with linguistic and control functions concerning the lexical problems in local and network environments.  The lexical suybsystem is a special efficiently organized program package, which plays the role of a "linguistic filter" in a broad sense for lexically incorrect queries, promotes integration of databases and information retrieval systems, and facilitates the creation of local information systems.  We hope that lexical subsystems can become productive for any large, especially distr

Matching document IDs

['497', '1416', '1171', '175', '179', '135', '27', '1448', '459', '1011', '553', '67', '607', '336', '64', '490', '483', '611', '572', '140', '376', '993', '310', '538', '1093', '451', '458', '1053', '593', '1136', '916', '815', '523', '674', '373', '126', '606', '213', '1298', '690', '883', '347', '1105', '595', '158', '128', '1081', '884', '617', '180', '525', '1143', '621', '682', '481', '114', '948', '955', '707', '375', '839', '1223', '1362', '575', '1241', '826', '574', '1128', '528', '325', '779', '202', '594', '350', '254', '579', '1106', '434', '630', '244', '461', '129', '511', '535', '737', '123', '565', '664', '1092', '652', '408', '321', '1114', '1207', '484', '1038', '822', '970', '120', '474', '137', '394', '874', '546', '1173', '504', '1309', '378', '1080', '998', '59', '1012', '119', '1098', '136', '381', '445', '660', '615', '1256', '648', '421', '1358', '319', '1139', '703', '1113', '947', '1077', '1405', '796', '1121', '513', '486', '72', '1170', '1035', '654', '591

Precision: 0.014

Recall: 0.636

F1-Score: 0.027<br><br>

Query

The Relational Model in Information Retrieval The relational model has received increasing attention during the past decade.  Its advantages include simplicity, consistency, and a sound theoretical basis.  In this article, the naturalness of viewing information retrieval relationally is demonstrated.  The relational model is presented, and the relational organization of a bibliographical database is shown. The notion of normalization is introduced and first, second, third, and fourth normal forms are demonstrated.  Relational languages are discussed, including the relational calculus, relational algebra, and SEQUEL. Numerous examples pertinent to information retrieval are presented in these relational languages.  Advantages of the relational approach to information retrieval are noted.


Matching document IDs

['523', '553', '1112', '613', '691', '185', '1261', '189', '752', '293', '186', '645', '21']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Electronic Information Interchange in an Office Environment This paper describes an architectural approach that provides information exchange across a broad spectrum of user applications and office automation offerings.  Some of the architectures described herein are currently implemented in existing IBM products.  These and other architectures will provide the basis for document interchange capability between products such as the IBM 5520 Administrative System, the IBM System/370 Distributed Office Support System (DISOSS), and the IBM Displaywriter System. Specifically described is a document distribution architecture and its associated data streams and others.  A general overview of the architectures as opposed to a detailed technical description is provided.  The architectures described are protocols for interchange between application processes; they do not address the specific user interface.  The document distribution architectures utilize SNA for data transmission and communicat

Matching document IDs

['617', '703', '136', '529', '1365', '375', '126', '180', '1430', '54', '655', '1207', '78', '624', '287', '512', '186', '1009', '957', '332', '380', '1211', '495', '1275', '704', '760', '269', '1237', '1400', '1071', '924', '1147']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

The Use of Automatic Relevance Feedback in Boolean Retrieval Systems A technique is described for automatic reformulation of boolean queries.  Based on patron relevance judgements of an initial retrieval, prevalence measures are derived for terms appearing in the retrieved set of documents that reflect a term's distribution among the relevant and non-relevant documents.  These measures are then used to guide the construction of a boolean query for a subsequent retrieval.  To illustrate the technique, a series of tests is described of its application to a small data base in an experimental environment.  Results compare favourably with feedback as employed in a SMART-type system.  MOre extensive testing is suggested to validate the technique.


Matching document IDs

['810', '1124', '487', '636', '603', '564', '660', '479', '731', '754', '446', '522', '575', '511', '662', '637', '1139', '962', '1274', '633', '630', '706', '579', '643', '663', '388', '785', '571', '824', '627', '780', '489', '835', '623', '1091', '737', '151', '612', '959', '422', '466', '1127', '1144', '798', '409', '1421', '133', '642', '126', '315', '30', '496', '278', '830', '1207', '357', '1092', '493', '682', '1126', '963', '606', '495', '317', '734', '1090', '593', '1359', '474', '758', '427', '563', '646', '1234', '841', '500', '801', '715', '1450', '1130', '1361', '709', '867', '174', '676', '69', '120', '616', '692', '1313', '1398', '302', '499', '408', '1362', '577', '136', '1215', '598', '327', '504', '967', '341', '373', '1405', '1027', '506', '1051', '1414', '527', '714', '1416', '360', '620', '1183', '1317', '291', '1064', '56', '1047', '121', '1315', '587', '602', '273', '426', '556', '945', '5', '821', '90', '907', '132', '359', '592', '795', '936', '316', '15', '98

Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Interacting in Natural Language With Artificial Systems:  The Donau Project This paper is intended to propose a new methodological approach to the conception and development of natural language understanding systems. This new contribution is supported by the design, implementation, and experimentation of DONAU:  a general purpose domain oriented natural language understanding system developed and presently running at the Milan Polytechnic Artificial Intelligence Project.  The system is based on a two level modular architecture intended to overcome the lack of flexibility and generality often pointed out in many existing systems, and to facilitate the exchange of results and actual experiences between different projects. The horizontal level allows an independent and parallel development of the single segments of the system (syntactic analyser, information extractor, legality controller).  The vertical level ensures the possibility of changing (enlarging or redefining) the definition of

Matching document IDs

['595', '534', '134', '1164', '1162', '1175', '479', '1399', '72', '1241', '1118', '168', '1407', '161', '716', '343', '1339', '830', '564', '946', '1395', '596', '772', '1049', '331', '1254', '315', '828', '657', '735', '178', '1169', '1356', '198', '922', '81', '249', '367', '75', '1095', '845', '918', '1178', '1403', '1030', '1122', '1200', '1253', '237', '578']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Approximate String Matching Approximate matching of strings is reviewed with the aim of surveying techniques suitable for finding an item in a database when there may be a spelling mistake or other error in the keyword.  The methods found are classified as either equivalence or similarity problems. Equivalence problems are seen to be readily solved using canonical forms. For similarity problems difference measures are surveyed, with a full description of the well-established dynamic programming method relating this to the approach using probabilities and likelihoods.  Searches for approximate matches in large sets using a difference function are seen to be an open problem still, though several promising ideas have been suggested.  Approximate matching (error correction) during parsing is briefly reviewed.


Matching document IDs

['317', '316', '706', '1098', '1105', '356', '121', '1024', '58', '395', '582', '291', '1078', '553', '830', '818', '730', '955', '11', '1213', '1294', '824', '827', '505', '194', '127', '1377']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Using an Online Microfiche Catalog for Technical Service and Retrieval of Bibliographic Data A prototype system is created that integrates a microfiche catalog into an online computer system for bibliographic control.  Costs and operational data are collected and analyzed.  The system permits the more economical microfiche storage of catalog records than would be feasible for comparable online magnetic disk storage.  Experimental tests demonstrate the feasibility of the online microfiche catalog system for use in library technical services and retrieval of bibliographic data.  The primary result of the project is the creation of a completely operational facility, including all equipment, software, procedures, and data bases necessary to demonstrate the system.  A second set of results is derived from the experimental use of the system and the evaluation of costs and times for various operations.  The cost effectiveness of the online microfiche catalog is demonstrated.


Matching document IDs

['17', '497', '164', '1419', '128', '809', '693', '207', '487', '808', '1184', '300', '352', '205', '1131', '602']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Natural Language Access to Information Systems.  An Evaluation Study of Its Acceptance by End Users The question is asked whether it is feasible to use subsets of natural languages as query languages for data bases in actual applications using the question answering system "USER SPECIALTY LANGUAGES" (USL). Methods of evaluating a natural language based information system will be discussed.  The results (error and language structure evaluation) suggest how to form the general architecture of application systems which use a subset of German as query language.


Matching document IDs

['317', '637', '445', '1327', '514', '1427', '1139', '434', '151', '1164', '1162', '530', '1171', '1136', '329', '1024', '378', '19', '396', '1124', '1366', '814', '572', '461', '900', '609', '595', '390', '168', '901', '539', '10', '175', '1185', '702', '1175', '641', '1120', '179', '159', '1443', '801', '78', '477', '597', '1225', '825', '498', '796', '1382', '798', '692', '779', '755', '802', '895', '1215', '328', '136', '1077', '1118', '614', '1099', '450', '1326', '1414', '866', '1129', '817', '1027', '585', '1180', '320', '761', '746', '1226', '389', '257', '25', '576', '197', '1043', '1267', '30', '342', '1409', '902', '760', '174', '357', '687', '1047', '1280', '480', '1046', '769', '789', '1323', '556', '479', '1055', '269', '397', '146', '447', '37', '443', '688', '1459', '149', '830', '788', '838', '873', '343', '65', '1393', '558', '1045', '432', '909']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Some Considerations Relating to the Cost-Effectiveness of Online Services in Libraries In 1978 Collier presented some hypothetical data on economic aspects of the use of online services as compared with subscriptions to printed services in libraries.  Collier's view of the economics of online searching seems misleadingly pessimistic because:  1.  It looks only at costs but not at effectiveness in comparing the two modes of access and searching.  An analysis combining cost and effectiveness aspects (i.e., a cost-effectiveness analysis) would give a completely different picture.  2.  The way the cost data are presented is grossly unfair to the online mode of access and use.  This work contains corrected information regarding online and printed services in libraries.


Matching document IDs

['1365', '1376', '799', '1366', '126', '842', '614', '1248', '728', '1377', '292', '217', '734', '957', '879', '986', '1396', '364', '365', '307', '190', '123', '724', '9', '348', '584', '1015', '358', '1324', '205', '376', '295', '962', '1008', '222', '408', '1368', '1417', '1378', '306', '959', '979', '1221', '162', '204', '575', '1264', '1353', '4', '1401', '161', '839', '841', '1060', '224', '792', '594', '984', '757', '840', '1415', '972', '942', '547', '721', '10', '976', '470', '529', '2', '281', '938', '866', '970', '208', '1360', '946', '297', '1020', '1418', '743', '1252', '153', '92', '1258', '865', '1453', '12', '265', '507', '1018', '279', '17', '1266', '638', '1265', '816', '193', '363', '31', '186', '1362', '115', '900', '963', '1203', '706', '459', '817', '1205', '244', '367', '189', '1230', '1032', '952', '1241', '305', '914', '214', '1212', '1457', '72', '617', '250', '296', '831', '932', '338', '1236', '1390', '930', '1215', '845', '206', '223', '774', '965', '583', 

Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Co-Citation Context Analysis and the Structure of Paradigms Many information scientists are concerned with the operation of document retrieval systems serving scientists in various fields.  The scientists served by these systems are often members of what have been called invisible colleges, groups of scientists in frequent communication with one another and involved with highly specialized subject matters.  Often such groups are considered to share an intellectual perspective regarding this subject matter, which is sometimes referred to as a paradigm.  The purpose of this paper is to show how it is possible to identify paradigms, using the techniques of citation analysis.  I will operationalize the notion of paradigm as a 'consensual structure of concepts in a field.' Suppose we have obtained a set of papers pertaining to some topic.  Already knowing something about the field, we read each text and mark passages in which certain specific concepts are used or discussed.  For example, we

Matching document IDs

['972', '781', '641', '1385', '313', '203', '570', '566', '448', '727', '785', '1159']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

Cocited Author Retrieval Online: An Experiment with the Social Indicators Literature One mode of online retrieval in Scisearch or Social Scisearch involves entering pairs of authors' names believed to be jointly cited by subsequent writers and retrieving papers in which cocitations occur.  Six pairs were formed with the names of four authors prominent in the social indicators movement (Bauer, Duncan, Land, and Sheldon).  Documents by the four were not specified.  It was thought that the pair Duncan and Land would retrieve papers in which indicator-type data would be integrated with path-analytic causal modeling.  All other pairs seemed likely to retrieve a "general social indicators" literature.  The 298 retrieved papers confirmed expectattions.  It was found that 121 papers generally cited social indicators (SI) documents by the input authors and frequently had SI language in their titles.  Other signs of content also identified them as papers of the SI movement.  The 177 papers retri

Matching document IDs

['123', '308', '126', '1342', '199', '140', '667', '1348']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>


Query ID: 92
Query: Database and Online Statistics for 1979 The number of databases, records contained in databases and the online use of databases has increased dramatically over the past several years, bringing the 1979 totals for bibliographic, bioliographic-related, and natural language databases to 528.  These 528 databases contain 148 million records.  Some 4 million online searches were conducted via the major U.S. and Canadian systems in 1979.
No matches found.



Query

Experiments in Local Metrical Feedback in Full-Text Retrieval Systems A method of iterative searching, using the results of one iteration search to formulate the next iteration search, was applied to a full-text database consisting of some 2400 documents and 1,3000,000 text-words of Hebrew and Aramaic.  The iterative method consists of clustering the documents returned in an iteration, using weighting by proximity and by frequency simultaneously. The process produces searchonyms, which are terms synonymous to keywords in the context of a single query.  Augumenting or replacing keywords by searchonyms via manual or automatic feedback leads to the formulation of the next iteration search.  The results of the experiment are consistent with those of an earlier small-scale experiment on an English database, and indicate that in contrast to global clustering where the size of matrices limits applications to small databases and improvements are doubtful, local metrical methods appear to be we

Matching document IDs

['1124', '483', '71', '51', '608', '1327', '79', '603', '487', '570', '820', '769', '390', '663', '321', '1164', '806', '503', '1162', '317', '117', '309', '751', '530', '1041', '341', '738', '72', '1382', '1135', '956', '1126', '422', '429', '565', '802', '705', '817', '525', '661', '625', '446', '748', '150', '1416', '175', '680', '190', '731', '478', '377', '659', '811', '722', '252', '34', '673', '1419', '571', '616', '179', '360', '598', '160', '158', '1108', '666', '521', '151', '1252', '1391', '865', '1118', '1399', '669', '998', '1298', '41', '545', '1396', '451', '643', '472', '552', '19', '815', '737', '1199', '1098', '1042', '1191', '1184', '232', '222', '1092', '1024', '1194', '73', '1157', '814', '47', '278', '1061', '657', '332', '773', '1323', '199', '52', '411', '790', '980', '668', '1358', '708', '670', '824', '1326', '1398', '1360', '1190', '424', '1158', '207', '633', '1057', '259', '1167', '198', '1105', '1070', '55', '641', '244', '875', '1172', '105', '282', '42',

Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

A Microcomputer Alternative for Information Handling:  Refles REFLES is a microcomputer-based system for data retrieval in library environments.  The problem of information retrieval is discussed from a theoretical point of view, followed by an analysis of the reference process and data thereby gathered, leading to a description of REFLES in terms of its hardware and software.  REFLES, a prototype system at present, currently functions in a test environment.  Examples of data contained in the system and of its use are presented.  Future considerations and speculations on other versions of the system conclude the paper.


Matching document IDs

['490', '575', '528', '512', '518', '129', '779', '590', '993', '67', '737', '604', '481', '1346', '644', '646', '611', '888', '74', '534', '531', '373', '446', '571', '1111', '328', '511', '731', '429', '725', '68', '862', '497', '870', '1090', '1173', '894', '822', '291', '1109', '596', '104', '847', '422', '487', '465', '443', '428', '813', '445', '89', '243', '1358', '767', '718', '873', '167', '294', '185', '615', '866', '809', '292', '889', '1360', '1254', '1024', '687', '572', '1429', '529', '193', '1078', '754', '228', '660', '755', '884', '890', '707', '41', '634', '127', '96', '498', '589', '1334', '640', '233', '1291', '371', '287', '760', '745', '451', '156', '593', '1397', '1273', '197', '1121', '659', '636', '623', '1272', '676', '59', '955', '562', '849', '1010', '592', '27', '776', '958', '482', '1422', '388', '64', '1238', '1252', '973', '52', '53', '347', '546', '11', '1020', '464', '1277', '686', '441', '1427', '457', '724', '795', '962', '161', '85', '97', '1072', '

Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

A Comparison of Two Systems of Weighted Boolean Retrieval A major deficiency of traditional Boolean systems is their inability to represent the varying degrees to which a document may be written on a subject. In this article we isolate a number of criteria that should be met by any Boolean system generalized to have a weighting capability.  It is proven that only one weighting rule satisfies these conditions--that associated with fuzzy- set theory--and that this weighting scheme satisfies most of the other properties associated with Boolean algebra as well.  Probabilistic weighting is then introduced as an alternative approach and the two systems compared. In the limit of zero/one weights, all systems considered converge to traditional Boolean retrieval.


Matching document IDs

['531', '810', '319', '895', '523', '446', '838', '519', '512', '773', '484', '659', '1125', '1120', '1054', '388', '1092', '608', '648', '986', '468', '594', '798', '690', '660', '1136', '702', '611', '826', '165', '1448', '661', '381', '459', '486', '1091', '575', '630', '1255', '898', '501', '28', '1170', '318', '737', '61', '1419', '445', '644', '627', '1197', '267', '606', '334', '790', '571', '434', '502', '1053', '779', '114', '1196', '615', '595', '806', '135', '620', '1134', '483', '600', '448', '883', '508', '1180', '565', '73', '68', '634', '497', '1248', '681', '689', '617', '329', '1282', '26', '71', '321', '1259', '727', '1201', '636', '827', '785', '706', '526', '1191', '492', '780', '728', '703', '461', '1171', '67', '562', '530', '820', '534', '1117', '1305', '525', '1139', '451', '839', '625', '670', '641', '1327', '762', '156', '458', '538', '382', '309', '591', '528', '754', '490', '1190', '159', '590', '389', '709', '966', '120', '376', '637', '518', '610', '516', 

Precision: 0.011

Recall: 0.273

F1-Score: 0.021<br><br>

Query

Threshold Values and Boolean Retrieval Systems Several papers have appeared that have analyzed recent developments in the problem of processing, in a document retrieval system, queries expressed as Boolean expressions.  The purpose of this paper is to continue that analysis. We shall show that the concept of threshold values resolves the problems inherent with relevance weights.  Moreover, we shall explore possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations.


Matching document IDs

['523', '956', '779', '724', '481', '571', '606', '837', '1144', '126', '827', '529', '847', '1375', '627', '525', '1417', '149', '387', '785', '1098', '1413', '745', '1309', '97', '10', '1179', '9', '1068', '1039', '977', '198', '700', '843', '1368', '22', '639', '1249', '1454', '1210', '500', '12', '1057', '1318', '1001', '246', '305', '1189', '919', '33', '991', '1335']


Precision: 0.019

Recall: 0.111

F1-Score: 0.033<br><br>

Query

A Model for a Weighted Retrieval System There has been a good deal of work on information retrieval systems that have continuous weights assigned to the index terms that describe the records in the database, and/or to the query terms that describe the user queries. Recent articles have analyzed retrieval systems with continuous weights of either type and/or with a Boolean structure for the queries.  They have also suggested criteria which such systems ought to satisfy and record evaluation mechanisms which partially satisfy these criteria.  We offer a more careful analysis, based on a generalization of the discrete weights.  We also look at the weights from an entirely different approach involving thresholds, and we generate an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms.  This new mechanism allows the user to attach a "threshold" to the query term.


Matching document IDs

['1415', '448', '77', '659', '688', '603', '824', '579', '870', '706', '446', '960', '512', '54', '508', '790', '773', '660', '1091', '570', '731', '445', '506', '52', '1366', '487', '514', '1363', '629', '1134', '726', '507', '820', '167', '825', '590', '1215', '894', '755', '643', '781', '653', '489', '812', '1230', '373', '873', '531', '197', '1392', '429', '1418', '762', '70', '736', '826', '442', '1090', '168', '1010', '572', '212', '959', '865', '702', '204', '1253', '428', '1443', '28', '1385', '803', '780', '1024', '596', '81', '562', '1164', '616', '1162', '479', '840', '488', '680', '810', '600', '687', '634', '161', '727', '1081', '1124', '530', '1176', '811', '359', '566', '390', '664', '604', '148', '833', '329', '779', '44', '449', '805', '1325', '267', '501', '813', '745', '158', '598', '458', '538', '1073', '511', '272', '888', '1404', '1182', '804', '149', '714', '357', '269', '798', '749', '649', '185', '778', '1171', '1339', '1307', '542', '497', '817', '764', '1326'

Precision: 0.026

Recall: 0.833

F1-Score: 0.050<br><br>

Query

A Translating Computer Interface for End-User Operation of Heterogeneous Retrieval Systems.  I. Design Online retrieval systems may be difficult to use, especially by end users, because of heterogeneity and complexity.  Investigations have concerned the concept of a translating computer interface as a means to simplify access to, and operation of, heterogeneous bibliographic retrieval systems and databases.  The interface allows users to make requests in a common language. These requests are translated by the interface into the appropriate commands for whatever system is being interrogated.  System responses may also be transformed by the interface into a common form before being given to the users.  Thus, the network of different systems is made to look like a single "virtual" system to the user.  The interface also provides instruction and other search aids for the user.  The philosophy, design, and implementation of an experimental interface named CONIT are described.


Matching document IDs

['309', '484', '497', '115', '175', '491', '1360', '179', '1139', '872', '1120', '731', '617', '1054', '566', '526', '703', '720', '530', '151', '991', '64', '1361', '1124', '119', '1092', '1126', '883', '682', '1256', '482', '840', '376', '56', '1105', '501', '1164', '707', '54', '1162', '197', '885', '842', '267', '1207', '689', '676', '958', '624', '998', '710', '681', '477', '68', '701', '1317', '1366', '448', '865', '655', '636', '1377', '737', '603', '507', '419', '244', '1416', '211', '1073', '328', '833', '815', '699', '332', '857', '522', '564', '1000', '553', '990', '762', '85', '517', '217', '1364', '1277', '961', '1144', '848', '1248', '1001', '1044', '666', '450', '488', '214', '745', '467', '608', '30', '498', '271', '288', '397', '500', '1224', '1129', '1274', '1098', '983', '715', '398', '1452', '646', '247', '1234', '194', '776', '863', '886', '77', '480', '790', '1371', '734', '225', '1134', '45', '679', '276', '216', '778', '425', '99', '1296', '861', '1140', '420', 

Precision: 0.012

Recall: 0.069

F1-Score: 0.021<br><br>

Query

A Translating Computer Interface for End-User Operation of Heterogeneous Retrieval Systems.  II. Evaluations The evaluation of the concept of a translating compuyter interface for simplifying operation of multiple, heterogenous online bibliographic retrieval systems has been undertaken.  An experimental retrieval system, named CONIT, was built and tested under controlled conditions with inexperienced end users.  A detailed analysis of the experimental usages showed that users were able to master interface operation sufficiently well to find relevant document references.  Success was attributed, in part, to a simple command language, adequate online instruction, and a simplified natural-language, keyword/stem approach to searching.  It is concluded that operational interfaces of the type studied can provide for increased usability of existing system in a cost effective manner, especially for searchers. Furthermore, more advanced interfaces based on improved instruction and automated sea

Matching document IDs

['648', '626', '594', '523', '546', '728', '593', '606', '1367', '309', '625', '514', '124', '610', '250', '637', '502', '731', '1054', '1368', '534', '486', '28', '575', '547', '492', '382', '1013', '630', '659', '634', '123', '591', '461', '839', '484', '135', '445', '49', '508', '615', '490', '512', '482', '826', '1171', '1170', '175', '491', '1327', '773', '16', '779', '67', '806', '660', '528', '497', '252', '608', '1375', '222', '726', '42', '993', '1418', '373', '820', '458', '959', '46', '310', '1215', '213', '1360', '167', '179', '1358', '342', '433', '692', '1195', '145', '1363', '190', '894', '243', '139', '378', '636', '1183', '327', '510', '370', '538', '1396', '941', '529', '202', '612', '191', '840', '47', '963', '720', '984', '126', '799', '257', '1113', '629', '962', '957', '1035', '495', '1078', '358', '1143', '723', '624', '845', '695', '115', '1361', '224', '456', '1357', '801', '870', '724', '781', '1376', '1043', '1450', '732', '743', '295', '1445', '70', '1440', 

Precision: 0.091

Recall: 0.618

F1-Score: 0.158<br><br>

Query

The Interface Between Computerized Retrieval Systems and Micrographic Retrieval Systems This paper notes the benefits accruing from interaction between computerized retrieval systems and micrographic retrieval systems.  It reviews current state of automated micrographic retrieval technology.  The conclusion is that with a combination of advances in communications technology, and sophisticated indexing input from libraries and information scientists, the new generation of automated micrographs devices may constitute the on-line document retrieval systems of the future.


Matching document IDs

['883', '1124', '131', '636', '716', '123', '655', '916', '631', '481', '993', '795', '72', '17', '661', '922', '1241', '1362', '1144', '1258', '685', '134', '482', '400', '1356', '386', '24', '897', '348', '943', '367', '1346', '718', '80', '846', '1437', '1238', '1082', '1290', '320', '1439', '1457', '100', '621', '801', '1045', '53', '491', '166', '401', '1251', '878', '423', '1273', '977', '485', '1429', '112', '142', '1', '1043', '1079', '901', '1088', '1390', '418', '580', '310', '938', '923', '1344', '847', '32', '1268', '870', '950', '1090', '1025', '767', '873', '1149', '1417', '185', '1383', '907', '915', '1294', '1354', '1388', '143', '453', '935', '902', '1438', '561']


Precision: 0.011

Recall: 0.056

F1-Score: 0.018<br><br>

Query

Parallel Computations in Information Retrieval Conventional information retrieval processes are largely based on data movement, pointer manipulations and integer arithmetic; more refined retrieval algorithms may in addition benefit from substantial computational power.  In the present study a number of parallel processing methods are described that serve to enhance retrieval services.  In conventional retrieval environments parallel list processing and parallel search facilities are of greatest interest.  In more advanced systems, the use of array processors also proves beneficial.  Various information retrieval processes are examined and evidence is given to demonstrate the usefulness of parallel processing and fast computational facilities in information retrieval.


Matching document IDs

['1089', '375', '309', '461', '514', '1078', '805', '363', '655', '853', '565', '175', '575', '704', '151', '28', '378', '1081', '755', '608', '1009', '199', '137', '459', '496', '703', '376', '895', '637', '1139', '1136', '179', '1264', '1120', '611', '1053', '1164', '660', '497', '590', '126', '664', '458', '1162', '538', '615', '490', '733', '1027', '1191', '707', '487', '1158', '319', '495', '1413', '690', '567', '484', '1405', '526', '1368', '78', '174', '523', '509', '1092', '1419', '525', '481', '1134', '1305', '131', '445', '797', '474', '827', '1190', '579', '66', '641', '591', '606', '631', '798', '562', '528', '956', '648', '595', '254', '486', '323', '630', '518', '446', '129', '508', '1180', '512', '123', '460', '659', '135', '1054', '451', '731', '67', '1126', '462', '1125', '1170', '1124', '176', '327', '539', '488', '502', '737', '851', '706', '156', '73', '839', '148', '806', '471', '644', '389', '454', '670', '478', '687', '472', '71', '625', '594', '1201', '1362', '8

Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

The Measurement of Term Importance in Automatic Indexing The frequency characteristics of terms in the documents of a collection have been used as indicators of term importance for content analysis and indexing purposes.  In particular, very rare or very frequent terms are normally believed to be less effective than medium-frequency terms.  Recently automatic indexing theories have been devised that use not only the term frequency characteristics but also the relevance properties of the terms. The major term-weighting theories are first briefly reviewed.  The term precision and term utility weights that are based on the occurrence characteristics of the terms in the relevant, as opposed to the nonrelevant, documents of a collection are then introduced.  Methods are suggested for estimating the relevance properties of the terms based on their overall occurrence characteristics in the collection.  Finally, experimental evaluation results are shown comparing the weighting systems using th

Matching document IDs

['1325', '634', '1419', '666', '700', '456', '185', '357', '646', '1143', '595', '625', '621', '291', '474', '330', '407', '1040', '202', '320', '134', '295', '525', '217', '128', '630', '468', '408', '916', '1095', '160', '821', '75', '540', '900', '364', '854', '112', '1053', '573', '139']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

NDX-100:  An Electronic Filing Machine for the Office of the Future This paper describes the design and implementation of an "electronic filing machine," a machine which is capable of storing large numbers of "unstructured" documents in such a way a particular document may be easily and quickly retrieved.  A functional distributed architecture permits the implementation of the system in a mixture of hardware and software.


Matching document IDs

['687', '702', '897', '512', '376', '528', '880']


Precision: 0.000

Recall: 0.000

F1-Score: 0.000<br><br>

Query

The Selection of Good Search Terms This paper tackles the problem of how one might select further search terms, using relevance feedback, given the search terms in the query.  These search terms are extracted from a maximum spanning tree connecting all the terms in the index term vocabulary.  A number of different spanning trees are generated from a variety of association measures.  The retrieval effectiveness for the different spanning trees is shown to be approximately the same.  Effectiveness is measured in terms of precision and recall, and the retrieval tests are done on three different test collections.


Matching document IDs

['894', '562', '486', '566', '812', '1124', '86', '448', '531', '596', '824', '61', '643', '726', '501', '488', '778', '959', '151', '649', '483', '956', '608', '419', '1184', '1419', '700', '238', '472', '1418', '167', '503', '71', '377', '78', '363', '576', '165', '366', '280', '748', '223', '509', '237', '92', '1448', '754', '865', '1417', '983', '278', '1323', '1243', '1156', '575', '564', '1253', '597', '59', '593', '1328', '51', '189', '1261', '407', '1425', '235', '815', '261', '522', '1072', '1251', '129', '393', '891', '850', '1023', '905', '646', '1183', '1020', '1028', '266', '1450', '182', '76', '661', '1322', '381', '376', '367', '981', '1153', '551', '926', '241', '1369', '242', '181', '260', '936', '568', '898', '1395', '297', '351', '1221', '115', '33', '901', '344', '8', '777', '1238', '46', '198', '485', '1019', '305', '1310', '284', '1094', '147', '80', '415', '903', '1147', '759', '944', '370', '1189', '2', '977', '303', '1202', '12', '1211', '1152', '1383', '221', 

Precision: 0.006

Recall: 0.091

F1-Score: 0.012<br><br>

Query

Indexing Consistency, Quality and Efficiency Indexing quality determines whether the information content of an indexed document is accurately represented.  Indexing effectiveness measures whether an indexed document is correctly retrieved every time it is relevant to a query.  Measurement of these criteria is cumbersome and costly; data base producers therefore prefer inter-indexer consistency as a measure of indexing quality or effectiveness.  The present article assesses the validity of this substitution in various environments.


Matching document IDs

['805', '608', '1359', '517', '309', '1375', '531', '518', '528', '390', '350', '964', '688', '129', '646', '612', '960', '1202', '642', '325', '1426', '490', '968', '202', '244', '1167', '1207', '357', '1065', '497', '142', '387', '1058', '772', '1025', '67', '1321', '484', '628', '1186', '1029', '1379', '1456']


Precision: 0.000

Recall: 0.091

F1-Score: 0.000<br><br>

Query

Text Passage Retrieval Based on Colon Classification:  Retrieval Performance A set of experiments was conducted to determine the suitability of the Colon Classification as a foundation for the automated analysis, representation and retrieval of primary information from the full text of documents.  Primary information is that information embodied in the text of a document, as opposed to secondary information which is generally in such forms as:  an abstract, a table of contents, or an index. Full text databases were created in two subject areas and queries solicited from specialists in each area.  An automated full text indexing system, along with four automated passage retrieval systems, was created to test the various features of the Colon Classification.  Two Boolean-based systems and one simple word occurrence system were created in order to compare the retrieval results against types of systems which are in more common use.  The systems' retrieval performances were measured using r

Matching document IDs

['565', '636', '1124', '702', '1195', '483', '514', '1448', '523', '486', '534', '644', '1419', '61', '820', '595', '575', '1170', '321', '609', '1327', '1298', '1144', '484', '1054', '510', '329', '1120', '71', '175', '641', '446', '179', '571', '1139', '517', '564', '16', '72', '1175', '727', '1078', '478', '458', '1409', '538', '390', '826', '1091', '1391', '180', '319', '526', '535', '522', '806', '591', '378', '389', '492', '434', '1225', '707', '525', '320', '309', '530', '556', '615', '136', '1230', '1105', '1136', '576', '773', '481', '1405', '637', '1261', '986', '798', '257', '252', '839', '1080', '785', '1196', '519', '620', '739', '507', '572', '874', '472', '989', '1035', '966', '889', '752', '336', '898', '28', '448', '690', '174', '682', '825', '135', '606', '512', '1038', '1358', '1114', '421', '177', '64', '779', '865', '659', '1267', '822', '461', '140', '123', '158', '725', '860', '474', '68', '1040', '1215', '445', '796', '165', '1109', '497', '660', '1092', '720', 

Precision: 0.000

Recall: 0.091

F1-Score: 0.000<br><br>

Query

User-Responsive Subject Control in Bibliographic Retrieval Systems A study was carried out of the relationship between the vocabulary of user queries and the vocabulary of documents relevant to the queries, and the value of adding to the document description record in a retrieval system keywords from previous queries for which the document had proved useful. Two test databases incorporating user query keywords were implemented at the School of Library and Information Science, University of Western Ontario.  Clustering of the documents via title and user keywords, a statistical analysis of title-user keyword co-occurrences, and retrieval tests were used to examine the effect of the added keywords.  Results showed the impracticality of the procedure in an operational setting, but indicated the value of analyses with sample data in the development and maintenance of keyword dictionaries and thesauri.


Matching document IDs

['1118', '483', '606', '504', '608', '643', '71', '802', '1091', '773', '176', '1171', '1448', '151', '1224', '30', '989', '212', '508', '501', '434', '627', '589', '506', '798', '390', '781', '53', '1076', '1139', '1144', '1413', '542', '809', '1073', '1163', '1414', '653', '1133']


Precision: 0.000

Recall: 0.091

F1-Score: 0.000<br><br>

Query

A Program for Machine-Mediated Searching A technique of online instruction and assistance to bibliographic data base searchers called Individualized Instruction for Data Access (IIDA) is being developed by Drexel University.  IIDA assists searchers by providing feedback based on real-time analysis while searches are being performed. Extensive help facilities which draw on this analysis are available to users.  Much of the project's experimental work, as described elsewhere, is concerned with the process of searching and the behavior of searchers. This paper will largely address itself to the project's computer system, which is being developed by subcontract with the Franklin Institute's Science Information Services.


Matching document IDs

['579', '637', '512', '547', '243', '65', '529', '692', '743', '1396', '496', '731', '1096', '135', '206', '307', '220', '216', '900', '728', '376', '375', '145', '604', '642', '727', '466', '504', '1078', '1376', '646', '528', '186', '813', '705', '839', '738', '408', '150', '594', '467', '629', '1362', '358', '854', '845', '1105', '4', '1401', '609', '671', '330', '1415', '490', '348', '593', '129', '1368', '1367', '1400', '169', '217', '760', '822', '291', '268', '801', '156', '1356', '1006', '1143', '1370', '465', '197', '892', '879', '252', '1144', '10', '739', '127', '295', '202', '250', '131', '293', '267', '1239', '141', '985', '364', '1147', '124', '1043', '126', '842', '619', '883', '1106', '674', '161', '687', '535', '1375', '818', '153', '1009', '272', '650', '223', '385', '14', '584', '310', '1430', '588', '809', '722', '1221', '17', '582', '1008', '140', '583', '207', '177', '820', '1440', '234', '133', '946', '645', '597', '208', '979', '429', '507', '66', '1192', '254',

Precision: 0.000

Recall: 0.091

F1-Score: 0.000<br><br>

Query

Author Cocitation:  A Literature Measure of Intellectual Structure It is shown that the mapping of a particular area of science, in this case information science, can be done using authors as units of analysis and the cocitations of pairs of authors as the variable that indicates their "distances" from each other.  The analysis assumes that the more two authors are cited together, the closer the relationship between them.  The raw data are cocitation counts drawn online from Social Scisearch (Social Sciences Citation Index) over the period 1972-1979.  GThe resulting map shows (1) identifiable author groups (akin to "schools") of information science, (2) locations of these groups with respect to each other, (3) the degree of centrality and peripherality of authors within groups, (4) proximities of authors within group and across group boundaries ("border authors" who seem to connect various areas of research), and (5) positions of authors with respect to the map's axes, which were arbit

Matching document IDs

['588', '1063', '1274', '632', '1168', '1312', '1338', '1313', '649', '918']


Precision: 0.600

Recall: 0.085

F1-Score: 0.148<br><br>

Query

Progress in Documentation.  Word Processing: An Introduction and Appraisal The "Office of the Future," "Office Technology," "Word Processing," "Electronic Mail," "Electronic Communications," "Convergence," "Information Management."  These are all terms included in the current list of buzz words used to describe current activities in the office technology area.  The high level of investment in factories and plants and the ever-increasing fight to improve productivity by automating the dull, routine jobs are usually quoted and compared with the extremely low investment in improving and automating the equally tedious routine jobs in the office environment; the investment in the factory is quoted as being ten times greater per employee than in the office.  This, however, is changing rapidly and investment on a large scale is already taking place in manhy areas as present-day inflation bites hard, forcing many companies and organizations to take a much closer look at their office operations

Matching document IDs

['964', '400', '497', '178', '572', '970', '490', '408', '350', '664', '17', '122', '1358', '1359', '1407', '585', '1070', '1257', '815', '188', '153', '129', '158', '27', '723', '1408', '525', '141', '523', '291', '842', '433', '484', '1027', '1397', '797', '659', '960', '993', '1105', '511', '309', '822', '692', '593', '348', '594', '984', '1193', '1219', '575', '1418', '515', '1317', '547', '654', '16', '732', '1274', '126', '243', '742', '310', '250', '31', '1208', '1183', '959', '907', '164', '491', '1227', '745', '938', '1051', '302', '1352', '529', '590', '704', '839', '1009', '819', '961', '941', '64', '1364', '44', '684', '1022', '1248', '67', '1171', '115', '674', '584', '1015', '690', '227', '25', '146', '857', '617', '591', '976', '709', '102', '681', '884', '710', '698', '265', '365']


Precision: 0.000

Recall: 0.085

F1-Score: 0.000<br><br>

Query

Document Clustering Using an Inverted File Approach An automated document clustering procedure is described which does not require the use of an inter-document similarity matrix and which is independent of the order in which the documents are processed.  The procedure makes use of an initial set of clusters which is derived from certain of the terms in the indexing vocabulary used to characterise the documents in the file.  The retrieval effectiveness obtained using the clustered file is compared with that obtained from serial searching and from use of the single-linkage clustering method.


Matching document IDs

['570', '422', '483', '1124', '1419', '608', '309', '503', '790', '52', '71', '446', '659', '377', '565', '487', '321', '820', '51', '737', '769', '643', '175', '530', '1118', '738', '1298', '708', '657', '663', '390', '1024', '72', '1126', '179', '79', '865', '429', '525', '661', '625', '824', '956', '673', '603', '571', '1327', '806', '962', '641', '817', '598', '980', '425', '1194', '731', '773', '842', '195', '190', '604', '1092', '1127', '822', '42', '1018', '1391', '875', '207', '299', '666', '105', '1422', '889', '194', '798', '1191', '521', '1108', '1190', '722', '472', '553', '1355', '208', '360', '1381', '1044', '41', '1252', '423', '82', '341', '670', '259', '1009', '146', '690', '1396', '680', '611', '500', '1325', '200', '1095', '151', '269', '668', '811', '77', '1164', '198', '1162', '34', '50', '1226', '225', '117', '669', '758', '633', '516', '1155', '705', '815', '1323', '545', '1292', '751', '317', '1416', '1360', '527', '552', '278', '814', '543', '222', '373', '816'

Precision: 0.008

Recall: 0.333

F1-Score: 0.015<br><br>

Query

A Fast Procedure for the Calculation of Similarity Coefficients in in Automatic Classification A fast algorithm is described for comparing the lists of terms representing documents in automatic classification experiments.  The speed of the procedure arises from the fact that all of the non-zero-valued coefficicents for a given document are identified together, using an inverted file to the terms in the document collection.  The complexity and running time of the algorithm are compared with previously described procedures.


Matching document IDs

['853', '503', '45', '576', '1124', '596', '328', '483', '1132', '643', '487', '485', '530', '608', '1421', '769', '812', '309', '636', '1118', '71', '566', '754', '825', '714', '179', '706', '175', '966', '1215', '68', '666', '842', '84', '1202', '635', '247', '508', '1139', '17', '16', '1398', '1028', '811', '64', '1045', '495', '1109', '634', '979', '834', '135', '1131', '423', '194', '591', '178', '264', '917', '1038', '1450', '200', '625', '1328', '1176', '689', '439', '174', '887', '756', '806', '1252', '374', '27', '1042', '504', '677', '593', '484', '610', '647', '1422', '311', '1218', '208', '1012', '38', '217', '249', '897', '527', '143', '227', '410', '326', '765', '385', '528', '1358', '352', '348']


Precision: 0.000

Recall: 0.333

F1-Score: 0.000<br><br>

Mean Average Precision: 0.054