## NeuroFinder Processing Tool - Jupyter Notebook Version

Welcome to the NeuroFinder Processing Tool. This Jupyter Notebook provides a non-GUI interface to run the project and get familiar with its functionalities. You can process your data files, update databases, and generate reports directly within this notebook.

The NeuroFinder Processing Tool automates the management of a comprehensive database containing company information related to neurotechnology. It facilitates the import, standardization, validation, and updating of company data files in multiple formats (e.g., CSV, Excel).

### Objective of This Notebook

This notebook aims to:
* Provide an interactive environment to run the NeuroFinder Processing Tool without the GUI.
* Allow you to load data files, process them, and export the results.
* Help you get familiar with the tool's functionalities.

### Prerequisites

Before running this notebook, ensure you have:

* Python 3.x installed.
* Necessary Python packages (we will install them in the next step).
* Access to the data files you wish to process.
* The main database files (main_database.xlsx, not_neurotech_database.xlsx).

In [None]:
# Install required packages
!pip install pandas openpyxl requests python-dotenv matplotlib seaborn sqlite3


In [None]:
!python.exe -m pip install --upgrade pip


In [2]:
# Import standard libraries
import os
import re
import unicodedata
from datetime import datetime as dt

# Import third-party libraries
import pandas as pd
from dotenv import load_dotenv

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl")


### Loading Environment Variables

If you have a .env file with environment variables, you can load it using python-dotenv. Otherwise, we can set default paths.

In [3]:
# Load environment variables
load_dotenv()
MAIN_DB_PATH = os.getenv('MAIN_DB_PATH')
NOT_NEUROTECH_DB_PATH = os.getenv('NOT_NEUROTECH_DB_PATH')
NEW_COMPANIES_PATH = os.getenv('NEW_COMPANIES_PATH')
UPDATED_COMPANIES_PATH = os.getenv('UPDATED_COMPANIES_PATH')


Defining Helper Functions

In [4]:
def clean_value(value):
    """Cleans the input value by stripping unwanted characters and converting to int if possible."""
    if pd.isna(value):
        return value
    cleaned_value = str(value).strip('="')
    try:
        return int(cleaned_value)
    except ValueError:
        return cleaned_value

def clean_dataframe(filepath, file_type='csv'):
    """Reads a file into a DataFrame, cleans it, and returns the cleaned DataFrame."""
    read_function = pd.read_csv if file_type == 'csv' else pd.read_excel
    df = read_function(filepath, index_col=False,
                       engine='openpyxl' if file_type == 'excel' else None)
    if 'former company names' in df.columns:
        df['former company names'] = df['former company names'].astype(str)
    for col in df.columns:
        df[col] = df[col].apply(clean_value)
    return df

def escape_special_characters(name: str) -> str:
    """Replaces special characters in a filename with underscores to ensure compatibility."""
    return re.sub(r'[^a-zA-Z0-9-_]', '_', name)


### Initializing the Database Handler

Create an instance of the DbHandler class to manage your databases.

In [5]:
from main.backend import DbHandler
# Initialize the database handler
db_handler = DbHandler(MAIN_DB_PATH, NOT_NEUROTECH_DB_PATH)

# Review the data

In [6]:
db_handler.main_db.describe()

Unnamed: 0,Company Founded Year,Last Funding Amount,Total Funding Amount,Number of Funding Rounds,Company Number of Investors,Company Number of Investments,acquired,Inactive Year,Number of Patents,Unnamed: 53,Contact Name
count,262.0,93.0,108.0,124.0,116.0,6.0,95.0,54.0,1.0,0.0,0.0
mean,2013.454198,9096734.0,25796170.0,2.548387,3.939655,3.0,0.105263,2019.222222,13.0,,
std,11.614324,18588340.0,69100250.0,2.123637,4.175058,1.264911,0.30852,3.451369,,,
min,1905.0,10000.0,16000.0,0.0,0.0,1.0,0.0,2007.0,13.0,,
25%,2011.0,850000.0,1310000.0,1.0,1.0,2.25,0.0,2017.0,13.0,,
50%,2016.0,2200000.0,4000000.0,2.0,2.0,3.5,0.0,2019.0,13.0,,
75%,2019.0,10000000.0,22117500.0,3.0,5.0,4.0,0.0,2022.0,13.0,,
max,2024.0,150000000.0,556900000.0,10.0,20.0,4.0,1.0,2024.0,13.0,,


In [7]:
db_handler.main_db.shape
# 659 companies X 58 columns (features) in the main database

(273, 60)

In [9]:
db_handler.main_db.head()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Inactive Year,Number of Patents,Comments,Unnamed: 53,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 58,Unnamed: 59
0,AcousticView,2024-02-14 00:00:00,yes,True,True,,http://www.acousticview.com/,https://finder.startupnationcentral.org/compan...,Imaging | Neuromonitoring,Medical devices | Medical equipment,...,,,,,,,,,,
1,ActualSignal,2024-07-14 00:00:00,No,True,True,,https://www.actualsignal.com/,https://finder.startupnationcentral.org/compan...,NeuroreHabilitation | NeuroDegenerative | Neur...,Digital & Health care,...,,,,,,,,,,
2,Adam CogTech,2024-02-14 00:00:00,yes,True,True,,http://adam-cogtec.com/,https://finder.startupnationcentral.org/compan...,Cognitive Assessment & Enhancement,Consumer Electronics,...,,,,,,,,אסף הראל,,
3,AlgoSensus,2024-02-14 00:00:00,yes,True,True,,https://www.algosensus.com/,https://finder.startupnationcentral.org/compan...,Cognitive Assessment & Enhancement,Medical devices | Medical equipment,...,,,,,,,,,,
4,Alpha Omega,2024-02-14 00:00:00,yes,True,True,,http://www.alphaomega-eng.com,https://finder.startupnationcentral.org/compan...,NeuroSurgery | NeuroDevices,Medical devices | Medical equipment,...,,13.0,,,,,,,,


## Review functions

#### search new companies

In [10]:
# Lets check the new copmanies database shape: 0 compnaies = the file is empty
db_handler.new_companies_db.shape

(0, 60)

In [20]:
# Let's view the new potenital companies from CrunchBase
cb_path = "jan25/crunchbase search.csv"
cb_new_data = clean_dataframe(cb_path)
cb_new_data.head() 

Unnamed: 0,Organization Name,Organization Name URL,Founded Date,Founded Date Precision,Full Description,Industries,Headquarters Location,Description,CB Rank (Company)
0,Firefly Neuroscience,https://www.crunchbase.com/organization/firefl...,2006-01-01,year,"Firefly Neuroscience utilizes big data, signal...","Artificial Intelligence (AI), Big Data, Health...","Herzliya, Tel Aviv, Israel","Firefly Neuroscience utilizes big data, signal...",1117
1,BioCatch,https://www.crunchbase.com/organization/biocatch,2011-01-01,year,BioCatch is the leader in Behavioral Biometric...,"Analytics, Cyber Security, FinTech, Fraud Dete...","Tel Aviv, Tel Aviv, Israel",BioCatch unlocks the power of behavior and del...,1419
2,Wearable Devices,https://www.crunchbase.com/organization/wearab...,2014-03-13,day,Wearable Devices Ltd. (NASDAQ: WLDS) is a grow...,"Artificial Intelligence (AI), Augmented Realit...","Yoqne`am `illit, HaZafon, Israel","Developing Mudra, a Brain-Computer Interface ...",1428
3,NeuroSense Therapeutics,https://www.crunchbase.com/organization/neuros...,2017-01-01,year,"NeuroSense Therapeutics Ltd., is a clinical-st...","Biotechnology, Neuroscience, Therapeutics","Herzliya, Tel Aviv, Israel",NeuroSense Therapeutics is a clinical-stage dr...,2391
4,RailVision,https://www.crunchbase.com/organization/railvi...,2015-01-01,year,Rail Vision is focusing on train safety securi...,"Automotive, Big Data, Image Recognition, Secur...","Ra'anana, HaMerkaz, Israel",RailVision introduces an Automated Early Warni...,4513


In [21]:
# Lets check the new crunchbase data shape
cb_new_data.shape

(265, 9)

In [22]:
# Let's start the search prcoess with the crunchbase file path and data_type as "cb"
db_handler.start_searching_process(file_path=cb_path, data_type="cb")
db_handler.new_companies_db.shape # Lets check the new copmanies data base shape

(24, 61)

#### Update new copmanies

In [23]:
db_handler.update_companies_db.shape

(0, 60)

In [25]:
# Let's view the new potenital companies from The Start Up Nation Central
tsun_path = 'jan25/brain1.csv'
tsun_new_data = clean_dataframe(tsun_path)
tsun_new_data.head() 

Unnamed: 0,Name,Finder URL,Description,Primary Sector,Founded,Employees,Funding Stage
0,BRAIN.Q,https://finder.startupnationcentral.org/compan...,Restorative Brain Health Therapeutics to Rever...,Health Tech & Life Sciences,2016,11-50,B
1,brain.space,https://finder.startupnationcentral.org/compan...,The Brain Data Company,Data Analysis & Decision Support,2018,11-50,A
2,Brain1,https://finder.startupnationcentral.org/compan...,Brainwave Entrainment Device,Health Tech & Life Sciences,2015,1-10,Pre-Funding
3,BrainBalance,https://finder.startupnationcentral.org/compan...,Minimally Invasive Device for the Treatment of...,Health Tech & Life Sciences,2016,1-10,Pre-Funding
4,BrainCommerce,https://finder.startupnationcentral.org/compan...,AI Sales Agent for E-Commerce,Business Software,2024,1-10,Pre-Funding


In [26]:
# Lets check the new  data shape
tsun_new_data.shape

(50, 7)

In [27]:
# Let's start the search prcoess with the crunchbase file path and data_type as "cb"
db_handler.start_searching_process(file_path=tsun_path, data_type="tsun")
db_handler.new_companies_db.shape

(31, 61)

In [28]:
db_handler.update_companies_db.shape

(0, 60)

In [30]:
df = pd.read_excel('main/data/not_neurotech.xlsx')

In [31]:
df.head()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech_Category,Market_Category,...,acquired,product_stage,Number of Patents,Comments,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57
0,1E Therapeutics,2024-02-14 00:00:00,yes,1.0,True,Not only neurotech,https://www.1etx.com/,https://finder.startupnationcentral.org/compan...,NeuroPharmacology | NeuroBioTechnology,Biotechnology & Biopharmaceutical,...,,,,,,,,,,
1,2breathe Technologies Ltd,2023-06-08 00:00:00,yes,0.0,False,Not relevant to Neurotech/IL,2breathe.com/about-us/,,,,...,,,,,,,,,,
2,4Girls,,,,False,,,,,,...,,,,,,,,,,
3,A-muse,,,,False,INACTIVE,,,,,...,,,,,,,,,,
4,AbiliSense,,,,False,,,,,,...,,,,,,,,,,


In [32]:
df.describe()

Unnamed: 0,"Operation Status (Active=True, False = False)",Company_Founded_Year,Total_Funding_Amount,Total_Funding_Amount_M_dollars,Number_of_Funding_Rounds,Company_Number_of_Investors,Company_Number_of_Investments,Number of Patents,Unnamed: 53,Unnamed: 54,Unnamed: 55
count,153.0,230.0,44.0,32.0,48.0,43.0,3.0,0.0,0.0,0.0,0.0
mean,0.620915,2014.186957,56933460.0,24.020813,2.708333,3.860465,2.0,,,,
std,0.486753,14.894529,165733500.0,33.632631,2.351625,3.937496,1.0,,,,
min,0.0,1892.0,71429.0,0.325,1.0,0.0,1.0,,,,
25%,0.0,2014.0,1000000.0,1.0,1.0,1.0,1.5,,,,
50%,1.0,2018.0,9300000.0,4.705,2.0,2.0,2.0,,,,
75%,1.0,2020.0,41875000.0,36.925,3.25,5.5,2.5,,,,
max,1.0,2023.0,920000000.0,120.0,13.0,16.0,3.0,,,,


In [33]:
df = pd.read_excel("main/data/NeuroTech Industry IL 2024.xlsx")

In [34]:
df.head()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Inactive Year,Number of Patents,Comments,Unnamed: 53,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 58,Unnamed: 59
0,AcousticView,2024-02-14 00:00:00,yes,True,True,,http://www.acousticview.com/,https://finder.startupnationcentral.org/compan...,Imaging | Neuromonitoring,Medical devices | Medical equipment,...,,,,,,,,,,
1,ActualSignal,2024-07-14 00:00:00,No,True,True,,https://www.actualsignal.com/,https://finder.startupnationcentral.org/compan...,NeuroreHabilitation | NeuroDegenerative | Neur...,Digital & Health care,...,,,,,,,,,,
2,Adam CogTech,2024-02-14 00:00:00,yes,True,True,,http://adam-cogtec.com/,https://finder.startupnationcentral.org/compan...,Cognitive Assessment & Enhancement,Consumer Electronics,...,,,,,,,,אסף הראל,,
3,AlgoSensus,2024-02-14 00:00:00,yes,True,True,,https://www.algosensus.com/,https://finder.startupnationcentral.org/compan...,Cognitive Assessment & Enhancement,Medical devices | Medical equipment,...,,,,,,,,,,
4,Alpha Omega,2024-02-14 00:00:00,yes,True,True,,http://www.alphaomega-eng.com,https://finder.startupnationcentral.org/compan...,NeuroSurgery | NeuroDevices,Medical devices | Medical equipment,...,,13.0,,,,,,,,


In [38]:
col_lst = df.columns
for category in col_lst:
    print(category) 

Company Name
Updating_Date
Logo in Visualization folder?
Operation Status (Active=True, False = False)
INCLUSION
Operation/relevant Notes
Website
Startup Nation Page
Neurotech Category
Market Category
Target Market
TechTools 1
TechTools 2
TechTools 3
Finder Description
Description
Full Description
CB (Crunchbase) Link
Company CB Categories
Company Location
Company Founded Year
Company Number of Employees
Company CB Rank
Funding Status
Last Funding Type
Last Funding Date
Last Funding Amount
Total Funding Amount
Total Funding Amount M dollars
Number of Funding Rounds
Estimated Revenue Range
Company Number of Investors
Company Number of Investments
Company LinkedIn Link
Company LinkedIn Followers Number
Company Contact Email
Company phone number
Founders
CSO
CTO
CEO
COO
CFO
Co-Founder
President
Team Members
Address
former company names
acquired
Product Stage
Inactive Year
Number of Patents
Comments
Unnamed: 53
Contact Name
Contact Phone Number / Email
האם יצרנו איתם כבר קשר? (כדי לא להתיש