<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [17]</a>'.</span>

# SciTeX String Processing Utilities

This comprehensive notebook demonstrates the SciTeX str module capabilities, covering string processing, formatting, and text manipulation utilities.

## Features Covered

### Text Processing
* String cleaning and path sanitization
* Text search and replacement
* String parsing and extraction
* Space normalization

### Formatting and Display
* Colored text output
* Debug printing utilities
* Block text formatting
* Readable byte formatting

### Scientific Text
* LaTeX formatting and fallbacks
* Scientific notation
* Mathematical text formatting
* Plot text optimization

### Security and Privacy
* API key masking
* ANSI escape code removal
* Safe text handling

In [1]:
# Detect notebook name for output directory
import os
from pathlib import Path

# Get notebook name (for papermill compatibility)
notebook_name = "04_scitex_str"
if 'PAPERMILL_NOTEBOOK_NAME' in os.environ:
    notebook_name = Path(os.environ['PAPERMILL_NOTEBOOK_NAME']).stem


In [2]:
import sys
sys.path.insert(0, '../src')
import scitex
import numpy as np
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt
import re
import os

# Set up example data directory
data_dir = Path("./str_examples")
data_dir.mkdir(exist_ok=True)

print("SciTeX String Processing Tutorial - Ready to begin!")
print(f"Available str functions: {len(scitex.str.__all__)}")
print(f"First 10 functions: {scitex.str.__all__[:10]}")

SciTeX String Processing Tutorial - Ready to begin!
Available str functions: 43
First 10 functions: ['LaTeXFallbackError', 'add_hat_in_latex_style', 'auto_factor_axis', 'axis_label', 'check_latex_capability', 'check_unit_consistency', 'clean_path', 'color_text', 'ct', 'decapitalize']


## Part 1: Basic String Processing

### 1.1 String Cleaning and Sanitization

In [3]:
# Path cleaning examples
problematic_paths = [
    "/home/user/data with spaces/file.txt",
    "C:\\Users\\Name\\Documents\\file.txt",
    "~/data/file with special chars!@#.txt",
    "./data//double//slashes///file.txt",
    "data/./current/./directory/file.txt"
]

print("Path Cleaning Examples:")
print("=" * 30)

for path in problematic_paths:
    try:
        cleaned = scitex.str.clean_path(path)
        print(f"Original: {path}")
        print(f"Cleaned:  {cleaned}")
        print()
    except Exception as e:
        print(f"Error cleaning '{path}': {e}")
        print()

# String capitalization
test_strings = [
    "Hello World",
    "MACHINE LEARNING",
    "DataScience",
    "python_programming",
    "AI-Research"
]

print("String Decapitalization:")
print("=" * 25)

for text in test_strings:
    decapitalized = scitex.str.decapitalize(text)
    print(f"'{text}' -> '{decapitalized}'")

Path Cleaning Examples:
Original: /home/user/data with spaces/file.txt
Cleaned:  /home/user/data with spaces/file.txt

Original: C:\Users\Name\Documents\file.txt
Cleaned:  C:\Users\Name\Documents\file.txt

Original: ~/data/file with special chars!@#.txt
Cleaned:  ~/data/file with special chars!@#.txt

Original: ./data//double//slashes///file.txt
Cleaned:  data/double/slashes/file.txt

Original: data/./current/./directory/file.txt
Cleaned:  data/current/directory/file.txt

String Decapitalization:
'Hello World' -> 'hello World'
'MACHINE LEARNING' -> 'mACHINE LEARNING'
'DataScience' -> 'dataScience'
'python_programming' -> 'python_programming'
'AI-Research' -> 'aI-Research'


### 1.2 Space Normalization and Text Cleanup

In [4]:
# Space normalization
messy_texts = [
    "This    has     multiple   spaces",
    "\t\nTabs and newlines\t\n everywhere\t\n",
    "   Leading and trailing spaces   ",
    "Mixed\t\n\r\n   whitespace   characters",
    "Normal text with single spaces"
]

print("Space Normalization:")
print("=" * 25)

for text in messy_texts:
    normalized = scitex.str.squeeze_spaces(text)
    print(f"Original: '{text}'")
    print(f"Squeezed: '{normalized}'")
    print()

# ANSI escape code removal
colored_texts = [
    "\033[31mRed text\033[0m",
    "\033[1;32mBold green text\033[0m",
    "\033[4;34mUnderlined blue text\033[0m",
    "Normal text without ANSI codes",
    "\033[91mBright red\033[0m mixed with \033[92mgreen\033[0m"
]

print("ANSI Code Removal:")
print("=" * 20)

for text in colored_texts:
    clean_text = scitex.str.remove_ansi(text)
    print(f"With ANSI: '{text}'")
    print(f"Cleaned:   '{clean_text}'")
    print()

Space Normalization:
Original: 'This    has     multiple   spaces'
Squeezed: 'This has multiple spaces'

Original: '	
Tabs and newlines	
 everywhere	
'
Squeezed: '	
Tabs and newlines	
 everywhere	
'

Original: '   Leading and trailing spaces   '
Squeezed: ' Leading and trailing spaces '

Original: 'Mixed	

   whitespace   characters'
Squeezed: 'Mixed	

 whitespace characters'

Original: 'Normal text with single spaces'
Squeezed: 'Normal text with single spaces'

ANSI Code Removal:
With ANSI: '[31mRed text[0m'
Cleaned:   'Red text'

With ANSI: '[1;32mBold green text[0m'
Cleaned:   'Bold green text'

With ANSI: '[4;34mUnderlined blue text[0m'
Cleaned:   'Underlined blue text'

With ANSI: 'Normal text without ANSI codes'
Cleaned:   'Normal text without ANSI codes'

With ANSI: '[91mBright red[0m mixed with [92mgreen[0m'
Cleaned:   'Bright red mixed with green'



## Part 2: Text Search and Manipulation

### 2.1 Search and Grep Functionality

In [5]:
# Create sample text for searching
sample_text = """
This is a sample text for demonstrating search functionality.
The text contains multiple lines and various patterns.
We have numbers like 123, 456, and 789.
Email addresses: john@example.com, jane.doe@university.edu
Phone numbers: (555) 123-4567, 555-987-6543
URLs: https://www.example.com, http://test.org
Some special characters: !@#$%^&*()
And finally, this text ends here.
"""

# Write to file for grep demonstration
test_file = data_dir / "sample_text.txt"
with open(test_file, 'w') as f:
    f.write(sample_text)

print("Text Search and Grep:")
print("=" * 25)

# Search for patterns in text
search_patterns = [
    "sample",
    "[0-9]+",  # Numbers
    "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",  # Email
    "https?://[^\s]+",  # URLs
    "\([0-9]{3}\) [0-9]{3}-[0-9]{4}",  # Phone numbers
]

for pattern in search_patterns:
    print(f"\nSearching for pattern: '{pattern}'")
    try:
        results = scitex.str.search(sample_text, pattern)
        if results:
            print(f"Found {len(results)} matches: {results}")
        else:
            print("No matches found")
    except Exception as e:
        print(f"Search error: {e}")

# Grep in file
print(f"\nGrep in file '{test_file}':")
try:
    grep_results = scitex.str.grep(str(test_file), "numbers")
    if grep_results:
        print(f"Grep results: {grep_results}")
    else:
        print("No grep results")
except Exception as e:
    print(f"Grep error: {e}")

Text Search and Grep:

Searching for pattern: 'sample'
Found 2 matches: ([], [])

Searching for pattern: '[0-9]+'
Found 2 matches: ([], [])

Searching for pattern: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
Found 2 matches: ([], [])

Searching for pattern: 'https?://[^\s]+'
Found 2 matches: ([], [])

Searching for pattern: '\([0-9]{3}\) [0-9]{3}-[0-9]{4}'
Found 2 matches: ([], [])

Grep in file 'str_examples/sample_text.txt':
Grep results: ([], [])


### 2.2 Text Replacement and Parsing

In [6]:
# Text replacement examples
replacement_examples = [
    ("Hello World", "World", "Python"),
    ("The quick brown fox", "brown", "red"),
    ("Machine Learning AI", "AI", "Artificial Intelligence"),
    ("Data Science 2024", "2024", "2025"),
    ("test@example.com", "@example.com", "@newdomain.org")
]

print("Text Replacement:")
print("=" * 20)

for original, old, new in replacement_examples:
    try:
        replaced = scitex.str.replace(original, old, new)
        print(f"Original: '{original}'")
        print(f"Replace '{old}' with '{new}': '{replaced}'")
        print()
    except Exception as e:
        print(f"Replacement error: {e}")
        print()

# Text parsing
parse_examples = [
    "name=John age=30 city=NYC",
    "temperature=25.5 humidity=60% pressure=1013.25",
    "model=LinearRegression accuracy=0.95 loss=0.05",
    "date=2024-01-01 time=12:30:00 timezone=UTC"
]

print("Text Parsing:")
print("=" * 15)

for text in parse_examples:
    try:
        parsed = scitex.str.parse(text)
        print(f"Original: '{text}'")
        print(f"Parsed: {parsed}")
        print()
    except Exception as e:
        print(f"Parse error for '{text}': {e}")
        print()

Text Replacement:
Replacement error: replace() takes from 1 to 2 positional arguments but 3 were given

Replacement error: replace() takes from 1 to 2 positional arguments but 3 were given

Replacement error: replace() takes from 1 to 2 positional arguments but 3 were given

Replacement error: replace() takes from 1 to 2 positional arguments but 3 were given

Replacement error: replace() takes from 1 to 2 positional arguments but 3 were given

Text Parsing:
Parse error for 'name=John age=30 city=NYC': parse() missing 1 required positional argument: 'pattern_or_input_str'

Parse error for 'temperature=25.5 humidity=60% pressure=1013.25': parse() missing 1 required positional argument: 'pattern_or_input_str'

Parse error for 'model=LinearRegression accuracy=0.95 loss=0.05': parse() missing 1 required positional argument: 'pattern_or_input_str'

Parse error for 'date=2024-01-01 time=12:30:00 timezone=UTC': parse() missing 1 required positional argument: 'pattern_or_input_str'



## Part 3: Colored Text and Debug Output

### 3.1 Colored Text Output

In [7]:
# Colored text examples
colors = ['red', 'green', 'blue', 'yellow', 'magenta', 'cyan', 'white']
text_styles = ['normal', 'bold', 'underline']

print("Colored Text Output:")
print("=" * 25)

# Basic colors
for color in colors:
    try:
        colored = scitex.str.color_text(f"This is {color} text", color)
        print(colored)
    except Exception as e:
        print(f"Color error for {color}: {e}")

# Using shorthand ct function
print("\nUsing ct() shorthand:")
try:
    print(scitex.str.ct("Success!", "green"))
    print(scitex.str.ct("Warning!", "yellow"))
    print(scitex.str.ct("Error!", "red"))
    print(scitex.str.ct("Info", "blue"))
except Exception as e:
    print(f"ct() error: {e}")

# Demonstration of different message types
messages = [
    ("Operation completed successfully", "green"),
    ("Warning: Low disk space", "yellow"),
    ("Error: File not found", "red"),
    ("Info: Processing data", "blue"),
    ("Debug: Variable x = 42", "magenta")
]

print("\nMessage Types:")
for message, color in messages:
    try:
        colored_msg = scitex.str.color_text(message, color)
        print(colored_msg)
    except Exception as e:
        print(f"Message coloring error: {e}")

Colored Text Output:
[91mThis is red text[0m
[92mThis is green text[0m
[94mThis is blue text[0m
[93mThis is yellow text[0m
[95mThis is magenta text[0m
[96mThis is cyan text[0m
[97mThis is white text[0m

Using ct() shorthand:
[92mSuccess![0m
[91mError![0m
[94mInfo[0m

Message Types:
[92mOperation completed successfully[0m
[91mError: File not found[0m
[94mInfo: Processing data[0m
[95mDebug: Variable x = 42[0m


### 3.2 Debug Printing and Block Formatting

In [8]:
import pandas as pd
import numpy as np
# Debug printing examples
debug_data = {
    'variables': {'x': 42, 'y': 3.14, 'name': 'test'},
    'array': np.array([1, 2, 3, 4, 5]),
    'dataframe': pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}),
    'list': [1, 2, 3, 4, 5],
    'nested': {'level1': {'level2': {'value': 100}}}
}

print("Debug Printing:")
print("=" * 18)

for name, data in debug_data.items():
    try:
        print(f"\nDebugging {name}:")
        scitex.str.print_debug(data, name)
    except Exception as e:
        print(f"Debug print error for {name}: {e}")

# Block text formatting
block_texts = [
    "This is a simple message",
    "This is a longer message that demonstrates block formatting capabilities",
    "Multi-line\nblock text\nexample",
    "Important: This is a critical message that needs attention!"
]

print("\nBlock Text Formatting:")
print("=" * 25)

for text in block_texts:
    try:
        print(f"\nOriginal: '{text}'")
        scitex.str.printc(text)  # Print colored/formatted block
    except Exception as e:
        print(f"Block formatting error: {e}")

Debug Printing:

Debugging variables:
Debug print error for variables: print_debug() takes 0 positional arguments but 2 were given

Debugging array:
Debug print error for array: print_debug() takes 0 positional arguments but 2 were given

Debugging dataframe:
Debug print error for dataframe: print_debug() takes 0 positional arguments but 2 were given

Debugging list:
Debug print error for list: print_debug() takes 0 positional arguments but 2 were given

Debugging nested:
Debug print error for nested: print_debug() takes 0 positional arguments but 2 were given

Block Text Formatting:

Original: 'This is a simple message'
[94m
----------------------------------------
This is a simple message
----------------------------------------
[0m

Original: 'This is a longer message that demonstrates block formatting capabilities'
[94m
----------------------------------------
This is a longer message that demonstrates block formatting capabilities
----------------------------------------
[0m



## Part 4: Scientific Text and LaTeX Formatting

### 4.1 LaTeX Style Formatting

In [9]:
# LaTeX style formatting examples
scientific_texts = [
    "alpha",
    "beta",
    "gamma",
    "theta",
    "lambda",
    "mu",
    "sigma",
    "phi",
    "x_hat",
    "y_bar",
    "z_prime"
]

print("LaTeX Style Formatting:")
print("=" * 25)

for text in scientific_texts:
    try:
        latex_formatted = scitex.str.to_latex_style(text)
        print(f"'{text}' -> '{latex_formatted}'")
    except Exception as e:
        print(f"LaTeX formatting error for '{text}': {e}")

# Safe LaTeX formatting with fallback
print("\nSafe LaTeX Formatting:")
print("=" * 25)

for text in scientific_texts:
    try:
        safe_latex = scitex.str.safe_to_latex_style(text)
        print(f"'{text}' -> '{safe_latex}'")
    except Exception as e:
        print(f"Safe LaTeX error for '{text}': {e}")

# Hat notation in LaTeX
hat_examples = ['x', 'y', 'z', 'theta', 'phi', 'mu']
print("\nHat Notation:")
print("=" * 15)

for var in hat_examples:
    try:
        hat_formatted = scitex.str.add_hat_in_latex_style(var)
        print(f"'{var}' -> '{hat_formatted}'")
    except Exception as e:
        print(f"Hat formatting error for '{var}': {e}")

LaTeX Style Formatting:


'alpha' -> 'alpha'
'beta' -> 'beta'
'gamma' -> 'gamma'
'theta' -> 'theta'
'lambda' -> 'lambda'
'mu' -> 'mu'
'sigma' -> 'sigma'
'phi' -> 'phi'
'x_hat' -> 'x_hat'
'y_bar' -> 'y_bar'
'z_prime' -> 'z_prime'

Safe LaTeX Formatting:
'alpha' -> 'alpha'
'beta' -> 'beta'
'gamma' -> 'gamma'
'theta' -> 'theta'
'lambda' -> 'lambda'
'mu' -> 'mu'
'sigma' -> 'sigma'
'phi' -> 'phi'
'x_hat' -> 'x_hat'
'y_bar' -> 'y_bar'
'z_prime' -> 'z_prime'

Hat Notation:
'x' -> 'x'
'y' -> 'y'
'z' -> 'z'
'theta' -> 'theta'
'phi' -> 'phi'
'mu' -> 'mu'


### 4.2 Scientific Text and Plot Formatting

In [10]:
# Scientific text formatting for plots
plot_labels = [
    ("Temperature (C)", "°C"),
    ("Pressure (Pa)", "Pa"),
    ("Voltage (V)", "V"),
    ("Current (A)", "A"),
    ("Frequency (Hz)", "Hz"),
    ("Energy (J)", "J"),
    ("Power (W)", "W")
]

print("Scientific Text Formatting:")
print("=" * 30)

for label, unit in plot_labels:
    try:
        formatted = scitex.str.scientific_text(label)
        print(f"'{label}' -> '{formatted}'")
    except Exception as e:
        print(f"Scientific text error for '{label}': {e}")

# Plot text formatting
print("\nPlot Text Formatting:")
print("=" * 25)

plot_texts = [
    "x-axis label",
    "y-axis label",
    "Main Title",
    "Subplot Title",
    "Legend Entry"
]

for text in plot_texts:
    try:
        formatted = scitex.str.format_plot_text(text)
        print(f"'{text}' -> '{formatted}'")
    except Exception as e:
        print(f"Plot text formatting error for '{text}': {e}")

# Axis labels and titles
print("\nAxis Labels and Titles:")
print("=" * 25)

axis_examples = [
    ("time", "seconds"),
    ("amplitude", "volts"),
    ("frequency", "hertz"),
    ("temperature", "celsius"),
    ("pressure", "pascals")
]

for var, unit in axis_examples:
    try:
        axis_label = scitex.str.axis_label(var, unit)
        title_formatted = scitex.str.title(f"{var} vs time")
        print(f"Variable: {var}, Unit: {unit}")
        print(f"  Axis label: '{axis_label}'")
        print(f"  Title: '{title_formatted}'")
        print()
    except Exception as e:
        print(f"Axis formatting error for {var}: {e}")

Scientific Text Formatting:
'Temperature (C)' -> 'Temperature (C)'
'Pressure (Pa)' -> 'Pressure (Pa)'
'Voltage (V)' -> 'Voltage (V)'
'Current (A)' -> 'Current (A)'
'Frequency (Hz)' -> 'Frequency (Hz)'
'Energy (J)' -> 'Energy (J)'
'Power (W)' -> 'Power (W)'

Plot Text Formatting:
'x-axis label' -> 'X-axis Label'
'y-axis label' -> 'Y-axis Label'
'Main Title' -> 'Main Title'
'Subplot Title' -> 'Subplot Title'
'Legend Entry' -> 'Legend Entry'

Axis Labels and Titles:
Variable: time, Unit: seconds
  Axis label: 'Time (seconds)'
  Title: 'Time Vs Time'

Variable: amplitude, Unit: volts
  Axis label: 'Amplitude (volts)'
  Title: 'Amplitude Vs Time'

Variable: frequency, Unit: hertz
  Axis label: 'Frequency (hertz)'
  Title: 'Frequency Vs Time'

Variable: temperature, Unit: celsius
  Axis label: 'Temperature (celsius)'
  Title: 'Temperature Vs Time'

Variable: pressure, Unit: pascals
  Axis label: 'Pressure (pascals)'
  Title: 'Pressure Vs Time'



### 4.3 Digit Factoring and Smart Formatting

In [11]:
import numpy as np
# Digit factoring for better readability
large_numbers = [
    [1000, 2000, 3000, 4000, 5000],
    [1500000, 2500000, 3500000, 4500000],
    [0.001, 0.002, 0.003, 0.004, 0.005],
    [12345, 23456, 34567, 45678, 56789],
    [1.2e6, 2.3e6, 3.4e6, 4.5e6, 5.6e6]
]

print("Digit Factoring:")
print("=" * 20)

for numbers in large_numbers:
    try:
        factored = scitex.str.factor_out_digits(numbers)
        print(f"Original: {numbers}")
        print(f"Factored: {factored}")
        print()
    except Exception as e:
        print(f"Digit factoring error: {e}")
        print()

# Smart tick formatting
tick_examples = [
    np.array([0, 1000, 2000, 3000, 4000, 5000]),
    np.array([0.001, 0.002, 0.003, 0.004, 0.005]),
    np.array([1e6, 2e6, 3e6, 4e6, 5e6]),
    np.array([0.0001, 0.0002, 0.0003, 0.0004, 0.0005])
]

print("Smart Tick Formatting:")
print("=" * 25)

for ticks in tick_examples:
    try:
        formatted = scitex.str.smart_tick_formatter(ticks)
        print(f"Original ticks: {ticks}")
        print(f"Formatted: {formatted}")
        print()
    except Exception as e:
        print(f"Smart tick formatting error: {e}")
        print()

Digit Factoring:
Original: [1000, 2000, 3000, 4000, 5000]
Factored: ([1.0, 2.0, 3.0, 4.0, 5.0], '$\\times 10^{3}$')

Original: [1500000, 2500000, 3500000, 4500000]
Factored: ([1.5, 2.5, 3.5, 4.5], '$\\times 10^{6}$')

Original: [0.001, 0.002, 0.003, 0.004, 0.005]
Factored: ([1.0, 2.0, 3.0, 4.0, 5.0], '$\\times 10^{-3}$')

Original: [12345, 23456, 34567, 45678, 56789]
Factored: ([1.23, 2.35, 3.46, 4.57, 5.68], '$\\times 10^{4}$')

Original: [1200000.0, 2300000.0, 3400000.0, 4500000.0, 5600000.0]
Factored: ([1.2, 2.3, 3.4, 4.5, 5.6], '$\\times 10^{6}$')

Smart Tick Formatting:
Original ticks: [   0 1000 2000 3000 4000 5000]
Formatted: (array([1000., 2000., 3000., 4000.]), ['1', '2', '3', '4'], '$\\times 10^{3}$')

Original ticks: [0.001 0.002 0.003 0.004 0.005]
Formatted: (array([0.0016, 0.0024, 0.0032, 0.004 , 0.0048]), ['1.6', '2.4', '3.2', '4', '4.8'], '$\\times 10^{-3}$')

Original ticks: [1000000. 2000000. 3000000. 4000000. 5000000.]
Formatted: (array([1600000., 2400000., 3200000., 

## Part 5: Security and Privacy Features

### 5.1 API Key Masking

In [12]:
# API key masking examples
sensitive_data = [
    "API_KEY=sk-1234567890abcdef1234567890abcdef",
    "SECRET_TOKEN=ghp_1234567890abcdef1234567890abcdef123456",
    "DATABASE_URL=postgresql://user:password@localhost:5432/db",
    "OPENAI_API_KEY=sk-proj-abcdef1234567890abcdef1234567890abcdef",
    "AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE",
    "This is normal text without sensitive data"
]

print("API Key Masking:")
print("=" * 20)

for data in sensitive_data:
    try:
        masked = scitex.str.mask_api(data)
        print(f"Original: '{data}'")
        print(f"Masked:   '{masked}'")
        print()
    except Exception as e:
        print(f"Masking error for '{data}': {e}")
        print()

# Demonstration of different API key formats
api_formats = [
    "sk-1234567890abcdef",  # OpenAI style
    "ghp_1234567890abcdef",  # GitHub style
    "xoxb-1234567890",  # Slack style
    "ya29.1234567890",  # Google style
    "EAACEdEose0cBA1234567890"  # Facebook style
]

print("Different API Key Formats:")
print("=" * 30)

for api_key in api_formats:
    try:
        masked = scitex.str.mask_api(api_key)
        print(f"API Key: '{api_key}' -> '{masked}'")
    except Exception as e:
        print(f"Error masking '{api_key}': {e}")

API Key Masking:
Original: 'API_KEY=sk-1234567890abcdef1234567890abcdef'
Masked:   'API_****cdef'

Original: 'SECRET_TOKEN=ghp_1234567890abcdef1234567890abcdef123456'
Masked:   'SECR****3456'

Original: 'DATABASE_URL=postgresql://user:password@localhost:5432/db'
Masked:   'DATA****2/db'

Original: 'OPENAI_API_KEY=sk-proj-abcdef1234567890abcdef1234567890abcdef'
Masked:   'OPEN****cdef'

Original: 'AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE'
Masked:   'AWS_****MPLE'

Original: 'This is normal text without sensitive data'
Masked:   'This****data'

Different API Key Formats:
API Key: 'sk-1234567890abcdef' -> 'sk-1****cdef'
API Key: 'ghp_1234567890abcdef' -> 'ghp_****cdef'
API Key: 'xoxb-1234567890' -> 'xoxb****7890'
API Key: 'ya29.1234567890' -> 'ya29****7890'
API Key: 'EAACEdEose0cBA1234567890' -> 'EAAC****7890'


## Part 6: Utility Functions

### 6.1 Readable Bytes and File Sizes

In [13]:
# Readable byte formatting
byte_sizes = [
    1024,  # 1 KB
    1024**2,  # 1 MB
    1024**3,  # 1 GB
    1024**4,  # 1 TB
    1500000,  # 1.5 MB
    2500000000,  # 2.5 GB
    512,  # 512 bytes
    1023,  # Just under 1 KB
    1048576 + 524288  # 1.5 MB
]

print("Readable Byte Formatting:")
print("=" * 30)

for size in byte_sizes:
    try:
        readable = scitex.str.readable_bytes(size)
        print(f"{size:>12} bytes -> {readable}")
    except Exception as e:
        print(f"Error formatting {size}: {e}")

# File size examples with actual files
print("\nFile Size Examples:")
print("=" * 20)

# Create test files of different sizes
test_files = [
    ("small.txt", "Small file content"),
    ("medium.txt", "Medium file content\n" * 1000),
    ("large.txt", "Large file content with lots of text\n" * 10000)
]

for filename, content in test_files:
    filepath = data_dir / filename
    with open(filepath, 'w') as f:
        f.write(content)
    
    file_size = filepath.stat().st_size
    readable_size = scitex.str.readable_bytes(file_size)
    print(f"{filename}: {file_size} bytes -> {readable_size}")

Readable Byte Formatting:
        1024 bytes -> 1.0 KiB
     1048576 bytes -> 1.0 MiB
  1073741824 bytes -> 1.0 GiB
1099511627776 bytes -> 1.0 TiB
     1500000 bytes -> 1.4 MiB
  2500000000 bytes -> 2.3 GiB
         512 bytes -> 512.0 B
        1023 bytes -> 1023.0 B
     1572864 bytes -> 1.5 MiB

File Size Examples:
small.txt: 18 bytes -> 18.0 B
medium.txt: 20000 bytes -> 19.5 KiB
large.txt: 370000 bytes -> 361.3 KiB


## Part 7: LaTeX Fallback System

### 7.1 LaTeX Capability Detection and Fallbacks

In [14]:
# LaTeX capability detection
print("LaTeX Capability Detection:")
print("=" * 30)

try:
    latex_available = scitex.str.check_latex_capability()
    print(f"LaTeX available: {latex_available}")
    
    latex_status = scitex.str.get_latex_status()
    print(f"LaTeX status: {latex_status}")
    
    fallback_mode = scitex.str.get_fallback_mode()
    print(f"Fallback mode: {fallback_mode}")
    
except Exception as e:
    print(f"LaTeX detection error: {e}")

# LaTeX fallback examples
latex_expressions = [
    r"$\alpha + \beta = \gamma$",
    r"$\frac{x^2}{y^2} = z$",
    r"$\sum_{i=1}^{n} x_i$",
    r"$\int_{0}^{\infty} e^{-x} dx$",
    r"$\sqrt{\frac{a}{b}}$"
]

print("\nLaTeX Fallback Examples:")
print("=" * 28)

for expr in latex_expressions:
    try:
        # Try safe rendering
        safe_rendered = scitex.str.safe_latex_render(expr)
        print(f"LaTeX: '{expr}'")
        print(f"Safe:  '{safe_rendered}'")
        
        # Try conversion to unicode
        unicode_version = scitex.str.latex_to_unicode(expr)
        print(f"Unicode: '{unicode_version}'")
        
        # Try conversion to mathtext
        mathtext_version = scitex.str.latex_to_mathtext(expr)
        print(f"MathText: '{mathtext_version}'")
        print()
        
    except Exception as e:
        print(f"LaTeX processing error for '{expr}': {e}")
        print()

# Fallback mode management
print("Fallback Mode Management:")
print("=" * 28)

try:
    # Enable fallback
    scitex.str.enable_latex_fallback()
    print("LaTeX fallback enabled")
    
    # Set fallback mode
    scitex.str.set_fallback_mode('unicode')
    print("Fallback mode set to 'unicode'")
    
    # Test with fallback
    test_expr = r"$\alpha + \beta$"
    result = scitex.str.safe_latex_render(test_expr)
    print(f"Test expression: '{test_expr}' -> '{result}'")
    
except Exception as e:
    print(f"Fallback management error: {e}")

LaTeX Capability Detection:
LaTeX available: True
LaTeX detection error: name 'plt' is not defined

LaTeX Fallback Examples:
LaTeX: '$\alpha + \beta = \gamma$'
Safe:  'α + β = γ'
Unicode: 'α + β = γ'
LaTeX processing error for '$\alpha + \beta = \gamma$': missing < at position 2

LaTeX: '$\frac{x^2}{y^2} = z$'
Safe:  'x²y² = z'
Unicode: 'x²y² = z'
LaTeX processing error for '$\frac{x^2}{y^2} = z$': missing < at position 2

LaTeX: '$\sum_{i=1}^{n} x_i$'
Safe:  '∑_i=1^n x_i'
Unicode: '∑_i=1^n x_i'
LaTeX processing error for '$\sum_{i=1}^{n} x_i$': missing < at position 2

LaTeX: '$\int_{0}^{\infty} e^{-x} dx$'
Safe:  '∫₀^∞ e^-x dx'
Unicode: '∫₀^∞ e^-x dx'
LaTeX processing error for '$\int_{0}^{\infty} e^{-x} dx$': missing < at position 2

LaTeX: '$\sqrt{\frac{a}{b}}$'
Safe:  '√ab'
Unicode: '√ab'
LaTeX processing error for '$\sqrt{\frac{a}{b}}$': missing < at position 2

Fallback Mode Management:
LaTeX fallback enabled
Fallback management error: Invalid fallback mode: unicode


## Part 8: Practical Applications

### 8.1 Scientific Data Processing Pipeline

In [15]:
# Create a comprehensive text processing pipeline
class TextProcessor:
    def __init__(self):
        self.processing_log = []
    
    def log_step(self, step, input_text, output_text):
        self.processing_log.append({
            'step': step,
            'input': input_text,
            'output': output_text,
            'input_length': len(input_text),
            'output_length': len(output_text)
        })
    
    def process_scientific_text(self, text):
        """Process scientific text through multiple cleaning steps."""
        original_text = text
        
        # Step 1: Normalize spaces
        text = scitex.str.squeeze_spaces(text)
        self.log_step("Space normalization", original_text, text)
        
        # Step 2: Remove ANSI codes
        text = scitex.str.remove_ansi(text)
        self.log_step("ANSI removal", self.processing_log[-1]['output'], text)
        
        # Step 3: Format for LaTeX
        text = scitex.str.safe_to_latex_style(text)
        self.log_step("LaTeX formatting", self.processing_log[-1]['output'], text)
        
        # Step 4: Mask sensitive data
        text = scitex.str.mask_api(text)
        self.log_step("API masking", self.processing_log[-1]['output'], text)
        
        return text
    
    def print_processing_log(self):
        """Print the processing log with colored output."""
        print(scitex.str.ct("=" * 50, "blue"))
        print(scitex.str.ct("TEXT PROCESSING LOG", "blue"))
        print(scitex.str.ct("=" * 50, "blue"))
        
        for i, entry in enumerate(self.processing_log, 1):
            print(f"\n{scitex.str.ct(f'Step {i}: ' + entry['step'], 'green')}")
            print(f"Input  ({entry['input_length']} chars): '{entry['input'][:50]}{'...' if len(entry['input']) > 50 else ''}'")
            print(f"Output ({entry['output_length']} chars): '{entry['output'][:50]}{'...' if len(entry['output']) > 50 else ''}'")

# Test the pipeline
processor = TextProcessor()

test_scientific_texts = [
    "\033[31mTemperature    measurements\033[0m   showed   alpha = 0.05   significance with API_KEY=sk-1234567890abcdef",
    "\t\nPressure   data\t\ncontains    beta   coefficients   SECRET_TOKEN=ghp_abcdef1234567890\n",
    "The   gamma   distribution   parameters   were   DATABASE_URL=postgresql://user:pass@host:5432/db"
]

print("Scientific Text Processing Pipeline:")
print("=" * 40)

for i, text in enumerate(test_scientific_texts, 1):
    print(f"\nProcessing text {i}:")
    processed = processor.process_scientific_text(text)
    print(f"Final result: '{processed}'")
    print()

# Show processing log
processor.print_processing_log()

Scientific Text Processing Pipeline:

Processing text 1:
Final result: 'Temp****cdef'


Processing text 2:
Final result: '	
Pr****890
'


Processing text 3:
Final result: 'The ****2/db'

[94mTEXT PROCESSING LOG[0m

[92mStep 1: Space normalization[0m
Input  (108 chars): '[31mTemperature    measurements[0m   showed   al...'
Output (99 chars): '[31mTemperature measurements[0m showed alpha = 0...'

[92mStep 2: ANSI removal[0m
Input  (99 chars): '[31mTemperature measurements[0m showed alpha = 0...'
Output (90 chars): 'Temperature measurements showed alpha = 0.05 signi...'

[92mStep 3: LaTeX formatting[0m
Input  (90 chars): 'Temperature measurements showed alpha = 0.05 signi...'
Output (90 chars): 'Temperature measurements showed alpha = 0.05 signi...'

[92mStep 4: API masking[0m
Input  (90 chars): 'Temperature measurements showed alpha = 0.05 signi...'
Output (12 chars): 'Temp****cdef'

[92mStep 5: Space normalization[0m
Input  (87 chars): '	
Pressure   data	
contains    b

### 8.2 Report Generation with Formatted Text

In [16]:
# Generate a formatted scientific report
def generate_scientific_report(experiment_data):
    """Generate a formatted scientific report."""
    
    # Report header
    report = []
    report.append(scitex.str.ct("=" * 60, "blue"))
    report.append(scitex.str.ct("SCIENTIFIC EXPERIMENT REPORT", "blue"))
    report.append(scitex.str.ct("=" * 60, "blue"))
    report.append("")
    
    # Experiment info
    report.append(scitex.str.ct("EXPERIMENT INFORMATION", "green"))
    report.append("-" * 30)
    report.append(f"Name: {experiment_data['name']}")
    report.append(f"Date: {experiment_data['date']}")
    report.append(f"Researcher: {experiment_data['researcher']}")
    report.append("")
    
    # Parameters
    report.append(scitex.str.ct("PARAMETERS", "green"))
    report.append("-" * 15)
    for param, value in experiment_data['parameters'].items():
        formatted_param = scitex.str.to_latex_style(param)
        report.append(f"{formatted_param}: {value}")
    report.append("")
    
    # Results
    report.append(scitex.str.ct("RESULTS", "green"))
    report.append("-" * 10)
    for metric, value in experiment_data['results'].items():
        if isinstance(value, float):
            formatted_value = f"{value:.4f}"
        else:
            formatted_value = str(value)
        report.append(f"{metric}: {formatted_value}")
    report.append("")
    
    # File sizes
    if 'file_sizes' in experiment_data:
        report.append(scitex.str.ct("FILE SIZES", "green"))
        report.append("-" * 12)
        for filename, size in experiment_data['file_sizes'].items():
            readable_size = scitex.str.readable_bytes(size)
            report.append(f"{filename}: {readable_size}")
        report.append("")
    
    # Status
    status = experiment_data.get('status', 'unknown')
    if status == 'success':
        status_line = scitex.str.ct(f"Status: {status.upper()}", "green")
    elif status == 'warning':
        status_line = scitex.str.ct(f"Status: {status.upper()}", "yellow")
    else:
        status_line = scitex.str.ct(f"Status: {status.upper()}", "red")
    
    report.append(status_line)
    report.append("")
    report.append(scitex.str.ct("=" * 60, "blue"))
    
    return "\n".join(report)

# Sample experiment data
experiment_data = {
    'name': 'Neural Network Performance Analysis',
    'date': '2024-01-15',
    'researcher': 'Dr. Jane Smith',
    'parameters': {
        'alpha': 0.001,
        'beta': 0.9,
        'gamma': 0.999,
        'lambda': 0.01,
        'epochs': 100,
        'batch_size': 32
    },
    'results': {
        'accuracy': 0.9542,
        'precision': 0.9123,
        'recall': 0.8876,
        'f1_score': 0.8998,
        'training_time': '2h 45m'
    },
    'file_sizes': {
        'model.pkl': 15728640,  # 15 MB
        'training_data.csv': 104857600,  # 100 MB
        'results.json': 2048,  # 2 KB
        'logs.txt': 524288  # 512 KB
    },
    'status': 'success'
}

# Generate and print the report
report = generate_scientific_report(experiment_data)
print(report)

# Save report to file
report_file = data_dir / "experiment_report.txt"
with open(report_file, 'w') as f:
    # Remove color codes for file output
    clean_report = scitex.str.remove_ansi(report)
    f.write(clean_report)

print(f"\nReport saved to: {report_file}")
print(f"Report file size: {scitex.str.readable_bytes(report_file.stat().st_size)}")

[94mSCIENTIFIC EXPERIMENT REPORT[0m

[92mEXPERIMENT INFORMATION[0m
------------------------------
Name: Neural Network Performance Analysis
Date: 2024-01-15
Researcher: Dr. Jane Smith

[92mPARAMETERS[0m
---------------
alpha: 0.001
beta: 0.9
gamma: 0.999
lambda: 0.01
epochs: 100
batch_size: 32

[92mRESULTS[0m
----------
accuracy: 0.9542
precision: 0.9123
recall: 0.8876
f1_score: 0.8998
training_time: 2h 45m

[92mFILE SIZES[0m
------------
model.pkl: 15.0 MiB
training_data.csv: 100.0 MiB
results.json: 2.0 KiB
logs.txt: 512.0 KiB

[92mStatus: SUCCESS[0m




Report saved to: str_examples/experiment_report.txt
Report file size: 699.0 B


## Summary and Best Practices

This tutorial demonstrated the comprehensive string processing capabilities of the SciTeX str module:

### Key Features Covered:
1. **Text Cleaning**: `clean_path()`, `squeeze_spaces()`, `remove_ansi()`
2. **Search and Replace**: `search()`, `grep()`, `replace()`
3. **Colored Output**: `color_text()`, `ct()` for enhanced readability
4. **Debug Tools**: `print_debug()`, `printc()` for development
5. **LaTeX Support**: `to_latex_style()`, `safe_latex_render()` with fallbacks
6. **Scientific Formatting**: `scientific_text()`, `format_plot_text()`
7. **Security**: `mask_api()` for sensitive data protection
8. **Utility Functions**: `readable_bytes()`, `factor_out_digits()`
9. **Smart Formatting**: `smart_tick_formatter()`, `axis_label()`
10. **LaTeX Fallback System**: Robust handling of LaTeX unavailability

### Best Practices:
- Use **text cleaning** functions before processing scientific data
- Apply **API masking** to protect sensitive information
- Use **colored output** for better user experience
- Implement **LaTeX fallbacks** for robust scientific text rendering
- Use **smart formatting** for better plot readability
- Apply **debug tools** during development
- Use **readable byte formatting** for file size reporting
- Implement **comprehensive text processing pipelines** for consistent results

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [17]:
# Cleanup
import shutil

cleanup = input("Clean up example files? (y/n): ").lower().startswith('y')
if cleanup:
    shutil.rmtree(data_dir)
    print("✓ Example files cleaned up")
else:
    print(f"Example files preserved in: {data_dir}")
    print(f"Files created: {len(list(data_dir.rglob('*')))}")
    total_size = sum(f.stat().st_size for f in data_dir.rglob('*') if f.is_file())
    print(f"Total size: {scitex.str.readable_bytes(total_size)}")

StdinNotImplementedError: raw_input was called, but this frontend does not support input requests.