# March 6 2024

# Regular Expression

### Example 1 Ruby Regular Expression

In [95]:
line = "CSEISE337 SBU"
puts line =~ /337/

puts line

6
CSEISE337 SBU


In [3]:
if line =~ /(337)+/ then
    puts " The String Contains 337"
end

In [4]:
puts line.class

String


In [5]:
puts /337/.class

Regexp


In [7]:
puts //.class


Regexp


### Back Referencing

In [70]:
"CSEISE337"=~/(CSE)(ISE)(337)/
puts $1
puts $2
puts $3

CSE
ISE
337


### Scan Method for Regular Expression

In [77]:
s = "Hello World"
t = s.scan(/\w{2}/).length
t2= s.scan(/\w{2}/)
puts t
puts t2

4
He
ll
Wo
rl


In [50]:
string = "The quick brown fox jumps over the lazy dog."
matches = string.scan(/\S+ \S+ \S+/)
puts matches.inspect  # Output: ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog."]


["The quick brown", "fox jumps over", "the lazy dog."]


### Pattern based Subtitution using String.sub and String.gsub

In [80]:
a = "the quick brown fox"
puts a.sub(/\s\S+/, '') # the brown fox
puts a.gsub(/\s\S+/, '') # the
puts a.sub(/^./) { $&.upcase } # The quick brown fox
puts a.gsub(/[aeiou]/) { $&.upcase } # thE qUIck brOwn fOx

the brown fox
the
The quick brown fox
thE qUIck brOwn fOx


### Match Operators

In [85]:
def showRE(a,re)
    if a =~ re
         "#{$`}<<#{$&}>>#{$'}"
    else
         "no match"
    end
end
puts showRE('very interesting', /t/)
puts showRE('Fats Waller', /ll/)

very in<<t>>eresting
Fats Wa<<ll>>er


### Anchors

In [86]:
puts showRE("this is\nthe time", /^the/) 
puts showRE("this is\nthe time", /is$/) 
puts showRE("this is\nthe time", /\Athis/) 
puts showRE("this is\nthe time", /\Athe/) 
puts showRE("this is\nthe time", /\bis/) 
puts showRE("this is\nthe time", /\Bis/)

this is
<<the>> time
this <<is>>
the time
<<this>> is
the time
no match
this <<is>>
the time
th<<is>> is
the time


### Regexp match () function

In [87]:
# Ruby code for Regexp.match() method 

# declaring Regexp value 
reg_a = /a/ 

# declaring Regexp value 
reg_b = /337/ 

# declaring Regexp value 
reg_c = /a/ 


# match method 
puts "Regexp match form : #{reg_a.match("abcd")}\n\n"

puts "Regexp match form : #{reg_b.match("CSEISE337")}\n\n"

puts "Regexp match form : #{reg_c.match("playway")}\n\n"


Regexp match form : a

Regexp match form : 337

Regexp match form : a



### Song Example
##### Try to make it a more cleaner verion! (Homework)

In [93]:
data = <<STR_TERM
/jazz/j00132.mp3 | 3:45 | Fats Waller | Aint Misbehavin
/jazz/j00319.mp3 | 2:58 | Louis Armstrong | Wonderful World
/bgrass/bg0732.mp3| 4:09 | Strength in Numbers | Texas Red
STR_TERM

data = data.split("\n")
puts data
songs = Array.new
data.length.times { |i|
    tmp = data[i].split(/\s*\|\s*/)
    puts "*****"
    puts tmp
    tmp[2].squeeze!(" ")
    tmp[2].gsub!(/\b\w/) { $&.upcase }
    mins, secs = tmp[1].scan(/\d+/)
    tmp[1] = mins.to_i*60 + secs.to_i
    songs[i] = tmp[i]
    puts songs
}

/jazz/j00132.mp3 | 3:45 | Fats Waller | Aint Misbehavin
/jazz/j00319.mp3 | 2:58 | Louis Armstrong | Wonderful World
/bgrass/bg0732.mp3| 4:09 | Strength in Numbers | Texas Red
*****
/jazz/j00132.mp3
3:45
Fats Waller
Aint Misbehavin
/jazz/j00132.mp3
*****
/jazz/j00319.mp3
2:58
Louis Armstrong
Wonderful World
/jazz/j00132.mp3
178
*****
/bgrass/bg0732.mp3
4:09
Strength in Numbers
Texas Red
/jazz/j00132.mp3
178
Strength In Numbers


3

## <<STR_TERM Marker in Ruby

Certainly! In Ruby, `<<STR_TERM` is a syntax called a "here document" or "heredoc." It allows you to define a multiline string easily. 

`STR_TERM` is a delimiter that marks the end of the multiline string. It can be any arbitrary string; `STR_TERM` is just a placeholder here. The delimiter is typically uppercase and may or may not be enclosed in quotes.

The syntax `<<STR_TERM` followed by a newline starts the definition of the heredoc. All text following this line, up to the line containing only the delimiter (`STR_TERM`), is considered part of the string. The ending delimiter must appear at the beginning of a line, and it must be the only text on that line (leading whitespace is allowed).

Here's an example:

```ruby
data = <<STR_TERM
This is a multiline string.
It can contain multiple lines of text.
STR_TERM
```

In this example, `STR_TERM` marks the end of the multiline string. The string starts with the line after `<<STR_TERM` and ends with the line before `STR_TERM`. So, `data` will contain:

```
This is a multiline string.
It can contain multiple lines of text.
```

Heredocs are useful for embedding large blocks of text, such as SQL queries, HTML templates, or any other multiline text, directly into Ruby code. It helps improve code readability and maintainability by keeping the text formatting intact.

In Ruby's regular expressions, `$&` is a global variable that holds the last matched string. It contains the portion of the string that was matched by the last successful pattern match operation.

Here's a simple example to illustrate its usage:

```ruby
string = "hello world"
pattern = /hello/
string =~ pattern
puts $&  # Output: hello
```

In this example:

- We have a string `"hello world"`.
- We define a regular expression pattern `/hello/` to match the word "hello".
- We use the `=~` operator to perform a pattern match operation on the string.
- After the match, `$&` contains the matched substring, which is `"hello"`, and we print it.

However, it's worth mentioning that using global variables like `$&` can have performance implications, especially when dealing with large strings and complex regular expressions. It's often recommended to avoid using them in performance-critical code and instead use the return values of methods like `match` or `scan`, which provide access to matched substrings in a more controlled and localized manner.

# Examples 

In [53]:
# Find the word 'like'
"Do you like cats?" =~ /like/

7

In [54]:
if "Do you like cats?".match(/like/)
  puts "Match found!"
end

Match found!


In [55]:
def contains_vowel(str)
  str =~ /[aeiou]/
end

contains_vowel("test") # returns 1
contains_vowel("sky")  # returns nil

In [56]:
def contains_number(str)
  str =~ /[0-9]/
end

contains_number("The year is 2015")  # returns 12
contains_number("The cat is black")  # returns nil

In [57]:
# If we don't escape, the letter will match
"5a5".match(/\d.\d/)

# In this case only the literal dot matches
"5a5".match(/\d\.\d/) # nil
"5.5".match(/\d\.\d/) # match

#<MatchData "5.5">

In [58]:
# Note that this will also match some invalid IP address
# like 999.999.999.999, but in this case we just care about the format.

def ip_address?(str)
  # We use !! to convert the return value to a boolean
  !!(str =~ /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/)
end

ip_address?("192.168.1.1")  # returns true
ip_address?("0000.0000")    # returns false

false

In [59]:
# We want to find if this string is exactly four letters long, this will
# still match because it has more than four, but it's not what we want.
"Regex are cool".match /\w{4}/

# Instead we will use the 'beginning of line' and 'end of line' modifiers
"Regex are cool".match /^\w{4}$/

# This time it won't match. This is a rather contrived example, since we could just
# have used .size to find the length, but I think it gets the idea across.

In [60]:
Line = Struct.new(:time, :type, :msg)
LOG_FORMAT = /(\d{2}:\d{2}) (\w+) (.*)/

def parse_line(line)
  line.match(LOG_FORMAT) { |m| Line.new(*m.captures) }
end

parse_line("12:41 INFO User has logged in.")
# This produces objects like this:
# 

#<struct time="12:41", type="INFO", msg="User has logged in.">

In [61]:
m = "John 31".match /\w+ (\d+)/

m[1]
# 31

"31"

In [62]:
m = "David 30".match /(?<name>\w+) (?<age>\d+)/
m[:age]
# => "30"
m[:name]
# => "David"

"David"

#### Note

In Ruby, the `inspect` method is used to return a human-readable representation of an object. It's commonly used for debugging purposes or when you want to see the internal state of an object.

When you call `inspect` on an object, it returns a string representation of that object. The returned string is formatted in a way that it can be used to recreate the object.

Here's a basic example:

```ruby
array = [1, 2, 3]
puts array.inspect  # Output: "[1, 2, 3]"
```

In this example, `inspect` is called on an array object `array`, and it returns a string representation of the array.

The `inspect` method is automatically called by Ruby when you use methods like `puts`, `p`, `pp` (pretty print), or when you use interpolation in strings with `#{}`. This is why you often see objects being printed directly without explicitly calling `inspect`.

Here's an example demonstrating how `inspect` is implicitly called:

```ruby
array = [1, 2, 3]
puts array  # Output: "[1, 2, 3]"
```

In this case, `puts` automatically calls `inspect` on the `array` object before printing it to the console.

It's worth noting that the `inspect` method can be overridden in custom classes to provide a customized string representation of objects. This can be useful for providing more meaningful output when debugging or displaying objects.