# NeuroFinder Processing Tool - Jupyter Notebook Version

### The NeuroFinder Processing Tool automates the management of a comprehensive database containing company information related to neurotechnology. It facilitates the import, standardization, validation, and updating of company data files in multiple formats (e.g., CSV, Excel).

#### This Notebook provides a non-GUI interface to run the project and get familiar with **some**  of it back-end functionalities:

##### - Class
##### - Cleaners
##### - Functions - operations, search, update, exports

### Objective of This Notebook

This notebook aims to:
* Provide an interactive environment to run the NeuroFinder Processing Tool without the GUI.
* Allow you to load data files, process them, and export the results.
* Help you get familiar with the tool's functionalities.

### Prerequisites

Before running this notebook, ensure you have:

* Python 3.x installed.
* Necessary Python packages (we will install them in the next step).
* Access to the data files you wish to process.
* The main database files (main_database.xlsx, not_neurotech_database.xlsx).

In [1]:
# Install required packages
!pip install pandas openpyxl requests python-dotenv matplotlib seaborn sqlite3


Collecting seaborn
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)


ERROR: Could not find a version that satisfies the requirement sqlite3 (from versions: none)
ERROR: No matching distribution found for sqlite3


In [2]:
!python.exe -m pip install --upgrade pip




## Impotrs and environment settings

* If you have a .env file with environment variables, you can load it using python-dotenv. Otherwise, we can set default paths.

In [3]:
# Import standard libraries
import os
import re
import unicodedata
from datetime import datetime as dt

# Import third-party libraries
import pandas as pd
from dotenv import load_dotenv

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl")


In [4]:
# Load environment variables
load_dotenv()
MAIN_DB_PATH = os.getenv('MAIN_DB_PATH')
NOT_NEUROTECH_DB_PATH = os.getenv('NOT_NEUROTECH_DB_PATH')
NEW_COMPANIES_PATH = os.getenv('NEW_COMPANIES_PATH')
UPDATED_COMPANIES_PATH = os.getenv('UPDATED_COMPANIES_PATH')


# Class

### Initializing the Database Handler

Create an instance of the DbHandler class to manage your databases.

In [5]:
from main.backend import DbHandler
# Initialize the database handler
db_handler = DbHandler(MAIN_DB_PATH, NOT_NEUROTECH_DB_PATH)


In [6]:
help(db_handler)

Help on DbHandler in module main.backend object:

class DbHandler(builtins.object)
 |  DbHandler(main_db_path, not_neurotech_path)
 |  
 |  Handles a data files from tsun, cb, pb and others
 |  
 |  Methods defined here:
 |  
 |  __init__(self, main_db_path, not_neurotech_path)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  clear_new_db(self)
 |      Clears new database
 |  
 |  export_new(self, path)
 |      Exports new database to excel
 |  
 |  export_updates(self, path)
 |      Exports the updates database to an Excel file.
 |  
 |  find_new_companies_cb(self)
 |      Processes Crunchbase (CB) records:
 |      - If the company exists in main_db or not_neurotech_db, it is skipped.
 |      - If the company exists in new_companies_db, update the missing CB-specific fields:
 |              'CB (Crunchbase) Link', 'Company_Location', 'Full Description', and 'Company CB Rank'.
 |      - Otherwise, add a new row containing all CB data.
 |  
 |  find_new_co

# check out the data base

In [7]:
# Head
db_handler.main_db.head()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Number of Patents,Comments,Unnamed: 53,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 58,Unnamed: 59,Normalized_Company_Name
0,AcousticView,2024-02-14 00:00:00,yes,True,True,,http://www.acousticview.com/,https://finder.startupnationcentral.org/compan...,Imaging | Neuromonitoring,Medical devices | Medical equipment,...,,,,,,,,,,acousticview
1,ActualSignal,2024-07-14 00:00:00,No,True,True,,https://www.actualsignal.com/,https://finder.startupnationcentral.org/compan...,NeuroreHabilitation | NeuroDegenerative | Neur...,Digital & Health care,...,,,,,,,,,,actualsignal
2,Adam CogTech,2024-02-14 00:00:00,yes,True,True,website does work,http://adam-cogtec.com/,https://finder.startupnationcentral.org/compan...,Cognitive Assessment & Enhancement,Consumer Electronics,...,,,,,,,אסף הראל,,,adamcogtech
3,AlgoSensus,2024-02-14 00:00:00,yes,True,True,website does work,https://www.algosensus.com/,https://finder.startupnationcentral.org/compan...,Cognitive Assessment & Enhancement,Medical devices | Medical equipment,...,,,,,,,,,,algosensus
4,Alpha Omega,2024-02-14 00:00:00,yes,True,True,,http://www.alphaomega-eng.com,https://finder.startupnationcentral.org/compan...,NeuroSurgery | NeuroDevices,Medical devices | Medical equipment,...,13.0,,,,,,,,,alphaomega


In [8]:
# shapes
print(db_handler.main_db.shape)
print(db_handler.not_neurotech_db.shape)

(273, 61)
(590, 59)


In [9]:
# columns
print(db_handler.main_db.columns)

Index(['Company Name', 'Updating_Date', 'Logo in Visualization folder?',
       'Operation Status (Active=True, False = False)', 'INCLUSION',
       'Operation/relevant Notes', 'Website', 'Startup Nation Page',
       'Neurotech Category', 'Market Category', 'Target Market', 'TechTools 1',
       'TechTools 2', 'TechTools 3', 'Finder Description', 'Description',
       'Full Description', 'CB (Crunchbase) Link', 'Company CB Categories',
       'Company Location', 'Company Founded Year',
       'Company Number of Employees', 'Company CB Rank', 'Funding Status',
       'Last Funding Type', 'Last Funding Date', 'Last Funding Amount',
       'Total Funding Amount', 'Total Funding Amount M dollars',
       'Number of Funding Rounds', 'Estimated Revenue Range',
       'Company Number of Investors', 'Company Number of Investments',
       'Company LinkedIn Link', 'Company LinkedIn Followers Number',
       'Company Contact Email', 'Company phone number', 'Founders', 'CSO',
       'CTO', 'CEO'

In [10]:
# info
db_handler.main_db.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 273 entries, 0 to 272
Data columns (total 61 columns):
 #   Column                                         Non-Null Count  Dtype  
---  ------                                         --------------  -----  
 0   Company Name                                   273 non-null    object 
 1   Updating_Date                                  240 non-null    object 
 2   Logo in Visualization folder?                  259 non-null    object 
 3   Operation Status (Active=True, False = False)  264 non-null    object 
 4   INCLUSION                                      273 non-null    object 
 5   Operation/relevant Notes                       103 non-null    object 
 6   Website                                        218 non-null    object 
 7   Startup Nation Page                            239 non-null    object 
 8   Neurotech Category                             225 non-null    object 
 9   Market Category                                242 non

In [11]:
# describe
db_handler.main_db.describe()

Unnamed: 0,Company Founded Year,Last Funding Amount,Total Funding Amount,Number of Funding Rounds,Company Number of Investors,Company Number of Investments,acquired,Inactive Year,Number of Patents,Unnamed: 53,Contact Name
count,262.0,93.0,108.0,124.0,116.0,6.0,95.0,54.0,1.0,0.0,0.0
mean,2013.454198,9096734.0,25796170.0,2.548387,3.939655,3.0,0.105263,2019.222222,13.0,,
std,11.614324,18588340.0,69100250.0,2.123637,4.175058,1.264911,0.30852,3.451369,,,
min,1905.0,10000.0,16000.0,0.0,0.0,1.0,0.0,2007.0,13.0,,
25%,2011.0,850000.0,1310000.0,1.0,1.0,2.25,0.0,2017.0,13.0,,
50%,2016.0,2200000.0,4000000.0,2.0,2.0,3.5,0.0,2019.0,13.0,,
75%,2019.0,10000000.0,22117500.0,3.0,5.0,4.0,0.0,2022.0,13.0,,
max,2024.0,150000000.0,556900000.0,10.0,20.0,4.0,1.0,2024.0,13.0,,


# Cleaners

In [12]:
from main.backend import clean_value, clean_dataframe, escape_special_characters
value = '="noisvalue'
print(clean_value(value))

speical_value = "nosie#@!$val??ue"
print(escape_special_characters(speical_value))

noisvalue
nosie____val__ue


In [13]:
noise_db = pd.DataFrame()

noise_data = {
                'col1': 'va!@#!@#le1',
                'col2': 'value2',
                'col3': 'value3'
            }


noise_db = pd.concat([noise_db, pd.DataFrame([noise_data])], ignore_index=True)
noise_db.to_excel('main/noise_db.xlsx', index=False)
noise_db.head()

Unnamed: 0,col1,col2,col3
0,va!@#!@#le1,value2,value3


In [14]:
new = clean_dataframe('main/noise_db.xlsx', 'excel')
new.head()

Unnamed: 0,col1,col2,col3
0,va!@#!@#le1,value2,value3


# Functions

In [15]:
db_handler.main_db.shape
# 273 companies X 61 categories

(273, 61)

In [16]:
db_handler.not_neurotech_db.shape
# 590 companies X 59 categories

(590, 59)

### is company in data base

In [31]:
true_cases = ["Thrombotech Ltd", # Company name in the correct format
               "THROMBOTECH LTD", # Company name in uppercase
               "thrombotech ltd", # Company name in lowercase
                "Thrombotech--!@#$$%^&    *()[].'/,-- Ltd", # Company name with special characters
               "T-h-r-o-m-b-o-t-e-c-h- -L-t-d-", # Company name with hyphens
                "Thrombotech_ Ltd"# Company name with underscore between words

]

for company_name in true_cases:
    in_main = db_handler.is_company_in_database(company_name, db_handler.main_db)
    in_not_neuro_tech = db_handler.is_company_in_database(company_name, db_handler.not_neurotech_db)
    # Print the company name with its description and whether it's found in each database.
    print(f"{company_name}) {in_main}")


Thrombotech Ltd) True
THROMBOTECH LTD) True
thrombotech ltd) True
Thrombotech--!@#$$%^&    *()[].'/,-- Ltd) True
T-h-r-o-m-b-o-t-e-c-h- -L-t-d-) True
Thrombotech_ Ltd) True


In [32]:
false_cases = ["ThrombotechLtd", # Company name without space between words
                "Thromboteech Ltd" # Company name with a typo
]

for company_name in false_cases:
    in_main = db_handler.is_company_in_database(company_name, db_handler.main_db)
    in_not_neuro_tech = db_handler.is_company_in_database(company_name, db_handler.not_neurotech_db)
    # Print the company name with its description and whether it's found in each database.
    print(f"{company_name}) {in_main}")

ThrombotechLtd) False
Thromboteech Ltd) False


## Serach process

In [19]:
# a new database initialized with the new companies (currently empty)
db_handler.new_companies_db.shape

(0, 61)

In [20]:
# Search for a company from the start up nation central database
brain_path = f'jan25/brain1.csv'
db_handler.start_searching_process(brain_path, "tsun")

# we might have new rows in the new companies database
db_handler.new_companies_db.shape

(7, 61)

In [21]:
db_handler.new_companies_db.tail()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Number of Patents,Comments,Unnamed: 53,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 58,Unnamed: 59,Normalized_Company_Name
2,BrainStorm,,,,,,,https://finder.startupnationcentral.org/compan...,,,...,,,,,,,,,,brainstorm
3,IronBrain,,,,,,,https://finder.startupnationcentral.org/compan...,,,...,,,,,,,,,,ironbrain
4,Bio Micro Science,,,,,,,https://finder.startupnationcentral.org/compan...,,,...,,,,,,,,,,biomicroscience
5,BioPass Pharma,,,,,,,https://finder.startupnationcentral.org/compan...,,,...,,,,,,,,,,biopasspharma
6,Cogntiv,,,,,,,https://finder.startupnationcentral.org/compan...,,,...,,,,,,,,,,cogntiv


In [22]:
# Let's view the new potenital companies from CrunchBase
cb_path = "jan25/crunchbase search.csv"
db_handler.start_searching_process(cb_path, "cb")

In [23]:
# Lets check the new crunchbase data shape
db_handler.new_companies_db.shape


(22, 62)

In [24]:
db_handler.new_companies_db.tail()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Comments,Unnamed: 53,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 58,Unnamed: 59,Normalized_Company_Name,Company_Location
17,New Bio Technology,,,,,,,,,,...,,,,,,,,,newbiotechnology,"Or Akiva, Hefa, Israel"
18,Slavgroup,,,,,,,,,,...,,,,,,,,,slavgroup,"Rosh Ha'ayin, HaMerkaz, Israel"
19,Insight Sparks,,,,,,,,,,...,,,,,,,,,insightsparks,"Tel Aviv, Tel Aviv, Israel"
20,NEURONIX,,,,,,,,,,...,,,,,,,,,neuronix,"Yoqne`am `illit, HaZafon, Israel"
21,CogniZance,,,,,,,,,,...,,,,,,,,,cognizance,"Tel Aviv, Tel Aviv, Israel"


## Update process

In [25]:
db_handler.update_companies_db.shape

(0, 61)

In [26]:
# Update campanies from the start up nation central database
brain_path = f'jan25/brain1.csv'
db_handler.start_update_process(brain_path, "tsun")

# we might have new rows in the update companies database
db_handler.update_companies_db.shape

(15, 61)

In [27]:
db_handler.update_companies_db.tail()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Number of Patents,Comments,Unnamed: 53,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 58,Unnamed: 59,Normalized_Company_Name
10,Bioimmunate Technologies,,,,,,,,,,...,,,,,,,,,,
11,Bionaut Labs,,,,,,,https://finder.startupnationcentral.org/compan...,,,...,,,,,,,,,,
12,BioXtreme,,,,,,,,,,...,,,,,,,,,,
13,CogniFit,,,,,,,,,,...,,,,,,,,,,
14,CorrActions,,,,,,,,,,...,,,,,,,,,,


In [28]:
# Let's view the update companies from CrunchBase
cb_path = "jan25/crunchbase search.csv"
db_handler.start_update_process(cb_path, "cb")
db_handler.update_companies_db.shape

(76, 61)

In [29]:
db_handler.update_companies_db.tail()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Number of Patents,Comments,Unnamed: 53,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 58,Unnamed: 59,Normalized_Company_Name
71,NeuroPet,,,,,,,,,,...,,,,,,,,,,
72,Brainster,,,,,,,,,,...,,,,,,,,,,
73,kmoEye,,,,,,,,,,...,,,,,,,,,,
74,Ixtlan Bioscience,,,,,,,,,,...,,,,,,,,,,
75,TreTone,,,,,,,,,,...,,,,,,,,,,


## Export

In [30]:
update_path = "jan25/updated_companies.xlsx"
new_path = "jan25/new_companies.xlsx"

db_handler.export_updates(update_path)
db_handler.export_new(new_path)   

in_db: 292, added: 22
