## NeuroFinder Processing Tool - Jupyter Notebook Version

Welcome to the NeuroFinder Processing Tool. This Jupyter Notebook provides a non-GUI interface to run the project and get familiar with its functionalities. You can process your data files, update databases, and generate reports directly within this notebook.

The NeuroFinder Processing Tool automates the management of a comprehensive database containing company information related to neurotechnology. It facilitates the import, standardization, validation, and updating of company data files in multiple formats (e.g., CSV, Excel).

### Objective of This Notebook

This notebook aims to:
* Provide an interactive environment to run the NeuroFinder Processing Tool without the GUI.
* Allow you to load data files, process them, and export the results.
* Help you get familiar with the tool's functionalities.

### Prerequisites

Before running this notebook, ensure you have:

* Python 3.x installed.
* Necessary Python packages (we will install them in the next step).
* Access to the data files you wish to process.
* The main database files (main_database.xlsx, not_neurotech_database.xlsx).

In [1]:
# Install required packages
!pip install pandas openpyxl requests python-dotenv matplotlib seaborn sqlite3


Collecting seaborn
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)


ERROR: Could not find a version that satisfies the requirement sqlite3 (from versions: none)
ERROR: No matching distribution found for sqlite3


In [2]:
!python.exe -m pip install --upgrade pip




In [3]:
# Import standard libraries
import os
import re
import unicodedata
from datetime import datetime as dt

# Import third-party libraries
import pandas as pd
from dotenv import load_dotenv

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl")


### Loading Environment Variables

If you have a .env file with environment variables, you can load it using python-dotenv. Otherwise, we can set default paths.

In [4]:
# Load environment variables
load_dotenv()
MAIN_DB_PATH = os.getenv('MAIN_DB_PATH')
NOT_NEUROTECH_DB_PATH = os.getenv('NOT_NEUROTECH_DB_PATH')
NEW_COMPANIES_PATH = os.getenv('NEW_COMPANIES_PATH')
UPDATED_COMPANIES_PATH = os.getenv('UPDATED_COMPANIES_PATH')


Defining Helper Functions

In [5]:
def clean_value(value):
    """Cleans the input value by stripping unwanted characters and converting to int if possible."""
    if pd.isna(value):
        return value
    cleaned_value = str(value).strip('="')
    try:
        return int(cleaned_value)
    except ValueError:
        return cleaned_value

def clean_dataframe(filepath, file_type='csv'):
    """Reads a file into a DataFrame, cleans it, and returns the cleaned DataFrame."""
    read_function = pd.read_csv if file_type == 'csv' else pd.read_excel
    df = read_function(filepath, index_col=False,
                       engine='openpyxl' if file_type == 'excel' else None)
    if 'former company names' in df.columns:
        df['former company names'] = df['former company names'].astype(str)
    for col in df.columns:
        df[col] = df[col].apply(clean_value)
    return df

def escape_special_characters(name: str) -> str:
    """Replaces special characters in a filename with underscores to ensure compatibility."""
    return re.sub(r'[^a-zA-Z0-9-_]', '_', name)


### Initializing the Database Handler

Create an instance of the DbHandler class to manage your databases.

In [6]:
from main.backend import DbHandler
# Initialize the database handler
db_handler = DbHandler(MAIN_DB_PATH, NOT_NEUROTECH_DB_PATH)

# Review the data

In [7]:
db_handler.main_db.describe()

Unnamed: 0,Total_Funding_Amount,Number_of_Funding_Rounds,Company_Number_of_Investors,Company_Number_of_Investments,Number of Patents,Unnamed: 51,Contact Name
count,142.0,164.0,151.0,6.0,1.0,0.0,0.0
mean,36090060.0,2.579268,3.827815,2.833333,13.0,,
std,110136600.0,2.195882,3.86223,1.32916,,,
min,16000.0,0.0,0.0,1.0,13.0,,
25%,1250000.0,1.0,1.0,2.0,13.0,,
50%,5206974.0,2.0,2.0,3.0,13.0,,
75%,27787500.0,3.0,5.0,4.0,13.0,,
max,920000000.0,13.0,16.0,4.0,13.0,,


In [8]:
db_handler.main_db.shape
# 659 companies X 58 columns (features)

(659, 58)

In [9]:
db_handler.main_db.head()

Unnamed: 0,Company_Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech_Category,Market_Category,...,product_stage,Number of Patents,Comments,Unnamed: 51,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 56,Unnamed: 57
0,1E Therapeutics,2024-02-14 00:00:00,yes,True,True,Not only neurotech: 1E's groundbreaking proces...,https://www.1etx.com/,https://finder.startupnationcentral.org/compan...,NeuroPharmacology | NeuroBioTechnology,Biotechnology & Biopharmaceutical,...,,,,,,,,,,
1,2breathe Technologies Ltd,2023-06-08 00:00:00,yes,False,False,Not relevant to Neurotech/IL,2breathe.com/about-us/,,,,...,,,,,,,,,,
2,AcousticView,2024-02-14 00:00:00,yes,True,True,,http://www.acousticview.com/,https://finder.startupnationcentral.org/compan...,Imaging | Neuromonitoring,Medical devices | Medical equipment,...,Released,,,,,,,,,
3,ActiView,2022-12-03 00:00:00,n.a,False,False,,www.actiview.io/,,Cognitive Assessment & Enhancement,Consumer Electronics,...,,,,,,,,,,
4,ActualSignal,2024-07-14 00:00:00,No,True,True,,https://www.actualsignal.com/,https://finder.startupnationcentral.org/compan...,NeuroreHabilitation | NeuroDegenerative | Neur...,Digital & Health care,...,,,,,,,,,,


## Review functions

#### search new companies

In [10]:
# Lets check the new copmanies database shape: 0 compnaies = the file is empty
db_handler.new_companies_db.shape

(0, 58)

In [11]:
# Let's view the new potenital companies from CrunchBase
cb_path = 'main/CB_july24.csv'
cb_new_data = clean_dataframe('main/CB_july24.csv')
cb_new_data.head() 

Unnamed: 0,Organization Name,Organization Name URL,Founded Date,Founded Date Precision,Full Description,Industries,Headquarters Location,Description,CB Rank (Company)
0,BioCatch,https://www.crunchbase.com/organization/biocatch,2011-01-01,year,BioCatch is the leader in Behavioral Biometric...,"Analytics, Cyber Security, FinTech, Fraud Dete...","Tel Aviv, Tel Aviv, Israel",BioCatch unlocks the power of behavior and del...,946
1,Wearable Devices,https://www.crunchbase.com/organization/wearab...,2014-03-13,day,Wearable Devices Ltd. (NASDAQ: WLDS) is a grow...,"Artificial Intelligence (AI), Augmented Realit...","Yoqne`am `illit, HaZafon, Israel","Developing Mudra, a Brain-Computer Interface ...",3307
2,TechSee,https://www.crunchbase.com/organization/techsee,2015-01-01,year,TechSee is a technology and technical support ...,"Artificial Intelligence (AI), Augmented Realit...","Herzliya, Tel Aviv, Israel",TechSee builds smart visual platforms that ena...,10023
3,Brainsway,https://www.crunchbase.com/organization/brainsway,2003-01-01,year,Brainsway's patented breakthrough technology l...,"Biotechnology, Health Care, Life Science","Jerusalem, Yerushalayim, Israel",Brainsway's patented breakthrough technology l...,10879
4,Cortica,https://www.crunchbase.com/organization/cortica,2007-01-01,year,Cortica is an Israeli company founded in 2007 ...,"Artificial Intelligence (AI), Automotive, Auto...","Tel Aviv, Tel Aviv, Israel",Cortica is a technology company developing AI ...,18456


In [12]:
# Lets check the new crunchbase data shape
cb_new_data.shape

(274, 9)

In [13]:
# Let's start the search prcoess with the crunchbase file path and data_type as "cb"
db_handler.start_searching_process(file_path=cb_path, data_type="cb")
db_handler.new_companies_db.shape # Lets check the new copmanies data base shape

(10, 58)

#### Update new copmanies

In [14]:
db_handler.update_companies_db.shape

(0, 58)

In [15]:
# Let's view the new potenital companies from CrunchBase
cb_path = 'main/CB_july24.csv'
cb_new_data = clean_dataframe('main/CB_july24.csv')
cb_new_data.head() 

Unnamed: 0,Organization Name,Organization Name URL,Founded Date,Founded Date Precision,Full Description,Industries,Headquarters Location,Description,CB Rank (Company)
0,BioCatch,https://www.crunchbase.com/organization/biocatch,2011-01-01,year,BioCatch is the leader in Behavioral Biometric...,"Analytics, Cyber Security, FinTech, Fraud Dete...","Tel Aviv, Tel Aviv, Israel",BioCatch unlocks the power of behavior and del...,946
1,Wearable Devices,https://www.crunchbase.com/organization/wearab...,2014-03-13,day,Wearable Devices Ltd. (NASDAQ: WLDS) is a grow...,"Artificial Intelligence (AI), Augmented Realit...","Yoqne`am `illit, HaZafon, Israel","Developing Mudra, a Brain-Computer Interface ...",3307
2,TechSee,https://www.crunchbase.com/organization/techsee,2015-01-01,year,TechSee is a technology and technical support ...,"Artificial Intelligence (AI), Augmented Realit...","Herzliya, Tel Aviv, Israel",TechSee builds smart visual platforms that ena...,10023
3,Brainsway,https://www.crunchbase.com/organization/brainsway,2003-01-01,year,Brainsway's patented breakthrough technology l...,"Biotechnology, Health Care, Life Science","Jerusalem, Yerushalayim, Israel",Brainsway's patented breakthrough technology l...,10879
4,Cortica,https://www.crunchbase.com/organization/cortica,2007-01-01,year,Cortica is an Israeli company founded in 2007 ...,"Artificial Intelligence (AI), Automotive, Auto...","Tel Aviv, Tel Aviv, Israel",Cortica is a technology company developing AI ...,18456


In [16]:
db_handler.start_update_process(cb_path, "cb")

In [17]:
db_handler.update_companies_db.shape

(87, 58)

In [20]:
db_handler.new_companies_db.to_csv("main/sadf.csv")