## NeuroFinder Processing Tool - Jupyter Notebook Version

Welcome to the NeuroFinder Processing Tool. This Jupyter Notebook provides a non-GUI interface to run the project and get familiar with its functionalities. You can process your data files, update databases, and generate reports directly within this notebook.

The NeuroFinder Processing Tool automates the management of a comprehensive database containing company information related to neurotechnology. It facilitates the import, standardization, validation, and updating of company data files in multiple formats (e.g., CSV, Excel).

### Objective of This Notebook

This notebook aims to:
* Provide an interactive environment to run the NeuroFinder Processing Tool without the GUI.
* Allow you to load data files, process them, and export the results.
* Help you get familiar with the tool's functionalities.

### Prerequisites

Before running this notebook, ensure you have:

* Python 3.x installed.
* Necessary Python packages (we will install them in the next step).
* Access to the data files you wish to process.
* The main database files (main_database.xlsx, not_neurotech_database.xlsx).

In [None]:
# Install required packages
!pip install pandas openpyxl requests python-dotenv matplotlib seaborn sqlite3


In [None]:
!python.exe -m pip install --upgrade pip


In [None]:
# Import standard libraries
import os
import re
import unicodedata
from datetime import datetime as dt

# Import third-party libraries
import pandas as pd
from dotenv import load_dotenv

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl")


### Loading Environment Variables

If you have a .env file with environment variables, you can load it using python-dotenv. Otherwise, we can set default paths.

In [None]:
# Load environment variables
load_dotenv()
MAIN_DB_PATH = os.getenv('MAIN_DB_PATH')
NOT_NEUROTECH_DB_PATH = os.getenv('NOT_NEUROTECH_DB_PATH')
NEW_COMPANIES_PATH = os.getenv('NEW_COMPANIES_PATH')
UPDATED_COMPANIES_PATH = os.getenv('UPDATED_COMPANIES_PATH')


Defining Helper Functions

In [None]:
def clean_value(value):
    """Cleans the input value by stripping unwanted characters and converting to int if possible."""
    if pd.isna(value):
        return value
    cleaned_value = str(value).strip('="')
    try:
        return int(cleaned_value)
    except ValueError:
        return cleaned_value

def clean_dataframe(filepath, file_type='csv'):
    """Reads a file into a DataFrame, cleans it, and returns the cleaned DataFrame."""
    read_function = pd.read_csv if file_type == 'csv' else pd.read_excel
    df = read_function(filepath, index_col=False,
                       engine='openpyxl' if file_type == 'excel' else None)
    if 'former company names' in df.columns:
        df['former company names'] = df['former company names'].astype(str)
    for col in df.columns:
        df[col] = df[col].apply(clean_value)
    return df

def escape_special_characters(name: str) -> str:
    """Replaces special characters in a filename with underscores to ensure compatibility."""
    return re.sub(r'[^a-zA-Z0-9-_]', '_', name)


### Initializing the Database Handler

Create an instance of the DbHandler class to manage your databases.

In [None]:
from main.backend import DbHandler
# Initialize the database handler
db_handler = DbHandler(MAIN_DB_PATH, NOT_NEUROTECH_DB_PATH)

# Review the data

In [None]:
db_handler.main_db.describe()

In [None]:
db_handler.main_db.shape
# 659 companies X 58 columns (features)

In [None]:
db_handler.main_db.head()

## Review functions

#### search new companies

In [None]:
# Lets check the new copmanies database shape: 0 compnaies = the file is empty
db_handler.new_companies_db.shape

In [None]:
# Let's view the new potenital companies from CrunchBase
cb_path = 'main/CB_july24.csv'
cb_new_data = clean_dataframe('main/CB_july24.csv')
cb_new_data.head() 

In [None]:
# Lets check the new crunchbase data shape
cb_new_data.shape

In [None]:
# Let's start the search prcoess with the crunchbase file path and data_type as "cb"
db_handler.start_searching_process(file_path=cb_path, data_type="cb")
db_handler.new_companies_db.shape # Lets check the new copmanies data base shape

#### Update new copmanies

In [None]:
db_handler.update_companies_db.shape

In [None]:
# Let's view the new potenital companies from CrunchBase
cb_path = 'main/CB_july24.csv'
cb_new_data = clean_dataframe('main/CB_july24.csv')
cb_new_data.head() 

In [None]:
db_handler.start_update_process(cb_path, "cb")

In [None]:
db_handler.update_companies_db.shape

In [19]:
import pandas as pd

In [None]:
df = pd.read_excel('main/not_neurotech.xlsx')

In [None]:
df.head()

In [None]:
df.describe()

In [20]:
df = pd.read_excel("main/NeuroTech Industry IL 2024.xlsx")

  warn(msg)


In [21]:
df.head()

Unnamed: 0,Company Name,Updating_Date,Logo in Visualization folder?,"Operation Status (Active=True, False = False)",INCLUSION,Operation/relevant Notes,Website,Startup Nation Page,Neurotech Category,Market Category,...,Product Stage,Number of Patents,Comments,Unnamed: 52,Contact Name,Contact Phone Number / Email,האם יצרנו איתם כבר קשר? (כדי לא להתיש),BrainstormIL contact,Unnamed: 57,Unnamed: 58
0,1E Therapeutics,2024-02-14 00:00:00,yes,True,True,Not only neurotech: 1E's groundbreaking proces...,https://www.1etx.com/,https://finder.startupnationcentral.org/compan...,NeuroPharmacology | NeuroBioTechnology,Biotechnology & Biopharmaceutical,...,,,,,,,,,,
1,AcousticView,2024-02-14 00:00:00,yes,True,True,,http://www.acousticview.com/,https://finder.startupnationcentral.org/compan...,Imaging | Neuromonitoring,Medical devices | Medical equipment,...,Released,,,,,,,,,
2,ActiView,2022-12-03 00:00:00,n.a,False,False,not neurotech,www.actiview.io/,,Cognitive Assessment & Enhancement,Consumer Electronics,...,,,,,,,,,,
3,ActualSignal,2024-07-14 00:00:00,No,True,True,,https://www.actualsignal.com/,https://finder.startupnationcentral.org/compan...,NeuroreHabilitation | NeuroDegenerative | Neur...,Digital & Health care,...,,,,,,,,,,
4,Adam CogTech,2024-02-14 00:00:00,yes,True,True,,http://adam-cogtec.com/,https://finder.startupnationcentral.org/compan...,Cognitive Assessment & Enhancement,Consumer Electronics,...,,,,,,,,אסף הראל,,


In [24]:
lst = df.columns

In [30]:
type(lst.to_list)
lst[0]

'Company Name'

In [33]:
if 'Company Name' in df.columns:
    print("2")

2
