# LaReQA-Based Multilingual Question Answering System

This notebook demonstrates a unique multilingual question answering system using Google's LaReQA model principles.

## Overview

LaReQA (Language-agnostic answer Retrieval from a Question and Answer corpus) is a groundbreaking approach that enables cross-lingual question answering. This means you can:

- Ask questions in one language
- Retrieve answers from documents in different languages
- Build truly multilingual knowledge systems

## Key Features

âœ… Cross-lingual information retrieval
âœ… Support for 8+ languages
âœ… Interactive knowledge base management
âœ… Real-time answer ranking
âœ… Extensible architecture

## 1. Setup and Imports

In [None]:
# Import the multilingual QA system
from multilingual_qa_system import MultilingualQASystem

# Additional imports for data visualization
import json
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better visualizations
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("âœ“ Setup complete!")

## 2. Initialize the QA System

In [None]:
# Create an instance of the Multilingual QA System
qa_system = MultilingualQASystem()

# Display initial statistics
qa_system.display_statistics()

## 3. Visualize Knowledge Base Distribution

In [None]:
# Get statistics
stats = qa_system.get_statistics()

# Create visualizations
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Language distribution
languages = list(stats['languages'].keys())
counts = list(stats['languages'].values())
axes[0].bar(languages, counts, color='skyblue', edgecolor='navy', alpha=0.7)
axes[0].set_xlabel('Language Code', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Number of Entries', fontsize=12, fontweight='bold')
axes[0].set_title('Knowledge Base - Language Distribution', fontsize=14, fontweight='bold')
axes[0].grid(axis='y', alpha=0.3)

# Category distribution
categories = list(stats['categories'].keys())
cat_counts = list(stats['categories'].values())
colors = plt.cm.Set3(range(len(categories)))
axes[1].pie(cat_counts, labels=categories, autopct='%1.1f%%', startangle=90, colors=colors)
axes[1].set_title('Knowledge Base - Category Distribution', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print(f"\nðŸ“Š Total entries in knowledge base: {stats['total_entries']}")

## 4. Ask Questions in Different Languages

Let's test the cross-lingual capabilities of our system!

### Example 1: English Query

In [None]:
# Ask a question in English
results = qa_system.ask_question("What is artificial intelligence?")

### Example 2: Cross-lingual Query (Spanish keywords)

In [None]:
# Ask using Spanish keywords
results = qa_system.ask_question("aprendizaje automÃ¡tico")

### Example 3: Query about Machine Learning Benefits

In [None]:
# Ask about ML benefits
results = qa_system.ask_question("machine learning benefits advantages")

### Example 4: Neural Networks

In [None]:
# Ask about neural networks
results = qa_system.ask_question("neural network")

### Example 5: Deep Learning (Chinese keywords)

In [None]:
# Query using Chinese keywords
results = qa_system.ask_question("æ·±åº¦å­¦ä¹ ")

## 5. Add New Knowledge to the System

One of the powerful features is the ability to dynamically expand the knowledge base.

In [None]:
# Add a new Q&A pair about computer vision
qa_system.add_to_knowledge_base(
    language='en',
    question='What is computer vision?',
    answer='Computer vision is a field of artificial intelligence that trains computers to interpret and understand the visual world. Using digital images and deep learning models, machines can accurately identify and classify objects.',
    category='Technology'
)

In [None]:
# Add another entry in Spanish
qa_system.add_to_knowledge_base(
    language='es',
    question='Â¿QuÃ© es el procesamiento del lenguaje natural?',
    answer='El procesamiento del lenguaje natural (PLN) es una rama de la inteligencia artificial que ayuda a las computadoras a entender, interpretar y manipular el lenguaje humano.',
    category='Technology'
)

In [None]:
# Verify the new entries were added
qa_system.display_statistics()

## 6. Test the Newly Added Knowledge

In [None]:
# Search for computer vision
results = qa_system.ask_question("computer vision image recognition")

In [None]:
# Search for NLP in Spanish
results = qa_system.ask_question("procesamiento lenguaje natural")

## 7. Analyze Query Results

Let's analyze the similarity scores for different queries.

In [None]:
# Test multiple queries and collect scores
test_queries = [
    "artificial intelligence",
    "machine learning",
    "neural networks",
    "data science",
    "deep learning"
]

query_scores = []

for query in test_queries:
    results = qa_system.search_answers(query, top_k=1)
    if results:
        query_scores.append({
            'query': query,
            'score': results[0]['similarity_score'],
            'language': results[0]['language']
        })

# Visualize the scores
plt.figure(figsize=(12, 6))
queries = [item['query'] for item in query_scores]
scores = [item['score'] for item in query_scores]
languages = [item['language'] for item in query_scores]

bars = plt.bar(queries, scores, color='coral', edgecolor='darkred', alpha=0.7)

# Add language labels on bars
for bar, lang in zip(bars, languages):
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height,
             f'{lang}',
             ha='center', va='bottom', fontweight='bold')

plt.xlabel('Query', fontsize=12, fontweight='bold')
plt.ylabel('Similarity Score', fontsize=12, fontweight='bold')
plt.title('Query Performance - Top Match Similarity Scores', fontsize=14, fontweight='bold')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

## 8. Save the Knowledge Base

In [None]:
# Save the current knowledge base to a file
qa_system.save_knowledge_base('lareqa_knowledge_base.json')

# Display a sample entry
print("\nSample entry from knowledge base:")
print(json.dumps(qa_system.knowledge_base[0], indent=2, ensure_ascii=False))

## 9. Load a Saved Knowledge Base

In [None]:
# Create a new instance and load the saved knowledge base
new_qa_system = MultilingualQASystem()
new_qa_system.load_knowledge_base('lareqa_knowledge_base.json')
new_qa_system.display_statistics()

## 10. Interactive Query Section

Try your own queries here!

In [None]:
# Try your own query here
my_query = "your question here"  # Replace with your question
results = qa_system.ask_question(my_query)

## 11. Future Enhancements

This system can be extended with:

1. **Real LaReQA Integration**: Connect to TensorFlow Hub's LaReQA model
2. **Advanced Embeddings**: Use multilingual BERT or XLM-RoBERTa
3. **Web Interface**: Build a Flask/FastAPI web application
4. **Database Backend**: Use PostgreSQL or MongoDB for scalability
5. **Real-time Translation**: Integrate Google Translate API
6. **Answer Generation**: Use T5 or GPT models for answer synthesis
7. **Voice Interface**: Add speech-to-text and text-to-speech
8. **Context Awareness**: Implement conversation history tracking

## Conclusion

This notebook demonstrates a powerful multilingual question answering system based on LaReQA principles. The system showcases:

- âœ… Cross-lingual information retrieval
- âœ… Dynamic knowledge base management
- âœ… Real-time answer ranking
- âœ… Extensible architecture for production deployment

For more information, visit the [LaReQA model on Kaggle](https://www.kaggle.com/models/google/lareqa).