# BERT Embeddings for Financial News Articles

This notebook demonstrates the use of a BERT model to create embeddings for news articles. This matches the Oliver Wyman NewsTrack model and will act as a POC for embeddings model support in the ValidMind Developer Framework.

## ValidMind at a glance

ValidMind's platform enables organizations to identify, document, and manage model risks for all types of models, including AI/ML models, LLMs, and statistical models. As a model developer, you use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on documentation projects. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

If this is your first time trying out ValidMind, you can make use of the following resources alongside this notebook:

- [Get started](https://docs.validmind.ai/guide/get-started.html) — The basics, including key concepts, and how our products work
- [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/guide/get-started-developer-framework.html) —  The path for developers, more code samples, and our developer reference

## Before you begin

::: {.callout-tip}
### New to ValidMind? 
For access to all features available in this notebook, create a free ValidMind account. 

Signing up is FREE — [**Sign up now**](https://app.prod.validmind.ai)
:::

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).

## Install the client library

The client library provides Python support for the ValidMind Developer Framework. To install it:

In [None]:
%pip install -q validmind

## Initialize the client library

Every documentation project in the Platform UI comes with a _code snippet_ that lets the client library associate your documentation and tests with the right project on the Platform UI when you run this notebook. As you will see later, documentation projects are useful because they act as containers for model documentation and validation reports and they enable you to organize all of your documentation work in one place. 

Get your code snippet by creating a documentation project:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. Go to **Documentation Projects** and click **Create new project**.

3. Select **`[Demo] Customer Churn Model`** and **`Initial Validation`** for the model name and type, give the project a unique  name to make it yours, and then click **Create project**.

4. Go to **Documentation Projects** > **YOUR_UNIQUE_PROJECT_NAME** > **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:

In [None]:
## Replace this placeholder with the code snippet from your own project ##

import validmind as vm

vm.init(
    api_host="...",
    api_key="...",
    api_secret="...",
    project="..."
)

In [None]:
from transformers import pipeline

embedding_model = pipeline('feature-extraction', model='bert-base-uncased', tokenizer='bert-base-uncased')

In [None]:
import pandas as pd

news_articles = [
    "In a stunning debut, BlueChip Technologies' shares opened at $45, nearly double its initial offer price of $25. Investors flocked to the tech firm known for its pioneering work in quantum computing. Analysts are watching closely to see if this bullish trend will continue in the coming weeks.",
    "Gold prices took a dramatic hit today, dropping 20% after reports of vast gold reserves discovered on Mars. This unprecedented find by space miners from the Venus Ventures has investors rethinking the future of earthly commodities.",
    "In a surprising move, the Central Bank announced the adoption of 'CentraCoin' as its official digital currency. This comes after months of speculation and debates about the future of digital assets in the mainstream financial sector. The bank's digital coin will be launched next month.",
    "The escalating trade tensions between Zephyria and Eastland have taken a toll on the global auto industry. Major automakers have reported supply chain disruptions, leading to a decline in quarterly profits. Investors are urged to exercise caution as negotiations continue.",
    "A recent report reveals that green bonds and sustainable investments have surged by 50% in the past year. Environmental concerns and a growing trend towards ethical investing are believed to be key drivers. Big investment firms are now ramping up their eco-friendly portfolios.",
    "MegaCorp Inc. and GiantEnterprises announced a historic merger, creating the world's largest conglomerate. Stocks soared by 30% in pre-market trading. Investors are optimistic, but regulators are scrutinizing the deal for potential antitrust issues.",
    "VirtualLand, a leading VR company, revealed breakthrough technology in haptic feedback. Following the announcement, its stock price surged by 40%. Competitors are racing to catch up, believing haptic advancements will revolutionize the industry.",
    "The agricultural sector is facing disruptions from a major locust invasion, severely impacting harvest forecasts. Commodity prices for staple grains have spiked, affecting global markets. Governments are deploying countermeasures to mitigate widespread economic impact.",
    "Following a decade of operation, social platform ConnectWorld announced its closure amid dwindling user interest and revenue. The announcement caused a company stock plummet of 75%. Analysts predict a market scramble to capture orphaned market share.",
    "After years of steady 2% growth, the nation shocked global markets with a sudden 5% GDP contraction. Experts cite a perfect storm of internal political strife, untimely natural disasters, and aggressive foreign economic policies. Recovery strategies are underway.",
    "Tech titan VirtuaTech unveiled its new line of augmented reality glasses today. This announcement pushed its stocks up by 20%, with analysts predicting a new wave in AR consumer tech.",
    "The global chocolate market is in chaos as cocoa bean yields plummet in major producing countries. Rumors of a cocoa disease have driven chocolate prices to a 10-year high.",
    "EcoJet, the electric plane manufacturer, announced a successful test flight of its first commercial aircraft. The aviation industry is abuzz with talks of a more sustainable future, causing EcoJet's shares to skyrocket.",
    "Rumors circulate that Bank of Infinity is facing a massive data breach, potentially exposing millions of customer records. Its shares have tumbled 15% in early trading.",
    "SolarTech, a leader in renewable energy, has secured a contract to power all government buildings with solar energy in the next five years. Investors are keenly watching this space.",
    "The booming e-sports industry faces its first major hurdle as the Global E-Sports Federation uncovers a massive match-fixing scandal. Major sponsors are reconsidering their investments.",
    "A sudden spike in demand for rare earth minerals, vital for tech production, has countries scrambling to secure their own sources. This comes after a major supplier announced a significant export cut.",
    "The pharmaceutical giant MedHeal announced a breakthrough in Alzheimer's research. Hopeful news sent their stocks soaring, but some experts urge caution until more results are published.",
    "A major undersea cable, responsible for 30% of global internet traffic, suffered damages, sending tech company stocks into a frenzy. Immediate repairs are being prioritized, but short-term disruptions are expected.",
    "Global coffee chains are grappling with a shortage of premium beans after unexpected frosts hit major coffee plantations. Consumers are bracing for a hike in coffee prices.",
    "E-commerce giant BuyNShip reported its most significant quarterly loss in five years. Experts believe this is due to the resurgence of brick-and-mortar stores and localized shopping trends.",
    "The rising trend of urban farming has led to a 40% increase in startups focusing on rooftop and balcony farming solutions. Investors are seeing green in more ways than one.",
    "The luxury car segment is experiencing an unexpected downturn, with global sales declining by 25%. Analysts point to a shifting preference towards sustainable and shared mobility solutions.",
    "RealEstateHub, the AI-driven property platform, declared bankruptcy after being hit by a series of lawsuits challenging its valuation algorithms. The real estate industry is on edge.",
    "The world's largest diamond, triple the size of the previous record-holder, was unearthed in a mine in Southfrica. The gem industry is buzzing with speculation about its potential value.",
    "Popular fitness app RunMasters is under scrutiny after claims that its calorie-tracking algorithm is flawed. The company's shares dipped by 12% after the news.",
    "CruiseTour, the major travel company, announced its move into space tourism, partnering with space agencies for lunar trips. The travel industry is watching with bated breath.",
    "A massive malware attack on several agricultural drone fleets has raised questions about food security and the risks of tech in farming. Tech firms are rushing to release security patches.",
    "The global wine industry is celebrating as scientists discover a grape variant that thrives in colder climates, promising a boost to wine production in non-traditional regions.",
    "Financial regulators unveiled a new framework for global cryptocurrency transactions, aiming to bring more transparency and security. Crypto markets responded positively, with a 10% overall increase.",
]

news_articles_df = pd.DataFrame(news_articles, columns=['article'])
news_articles_df.head()

In [None]:
vm_test_ds = vm.init_dataset(
    dataset=news_articles_df,
    text_column="article",
)

vm_model = vm.init_model(
    embedding_model,
    test_ds=vm_test_ds,
)

In [None]:
vm_model.predict(news_articles)

In [None]:
from validmind.tests import run_test

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.DescriptiveAnalytics",
    model=vm_model,
)

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.CosineSimilarityDistribution",
    model=vm_model,
)

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.ClusterDistribution",
    model=vm_model,
)

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.EmbeddingsVisualization2D",
    model=vm_model,
)

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.StabilityAnalysisRandomNoise",
    model=vm_model,
)

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.StabilityAnalysisSynonyms",
    model=vm_model,
)

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.StabilityAnalysisKeyword",
    model=vm_model,
    params={
        "keyword_dict": {
            'investors': 'shareholders',
            'tech': 'technology',
            'are': 'exist',
            'this': 'that',
            'after': 'post',
            'by': 'via',
            'announced': 'declared',
            'have': 'possess',
            'global': 'worldwide',
            'industry': 'sector',
            'major': 'primary',
        }
    }
)

In [None]:
result = run_test(
    "validmind.model_validation.embeddings.StabilityAnalysisTranslation",
    model=vm_model,
)