# Chapter-7 Strings
This notebook contains the sample source code explained in the book *Hands-On Julia Programming, Sambit Kumar Dash, 2021, bpb Publications. All Rights Reserved*.

In [1]:
using Pkg
pkg"activate ."
pkg"instantiate"

[32m[1m  Activating[22m[39m environment at `C:\Users\smile\OneDrive\Desktop\Tanvi\Woxen\Classes\Term 1\Python and Julia\Julia\Hands-on-Julia-Programming\Chapter 07\Project.toml`


## 7.1 Introduction

Strings can be considered as a collection of characters. For a detailed understanding please refer to the book chapter. 

## 7.2 String

Simple example of strings presented with various initialization literal definitions. 

In [2]:
str = "This is a string"

"This is a string"

In [3]:
str = """ 
        This is a preformatted 
        "string" """

" \nThis is a preformatted \n\"string\" "

In [6]:
a = "Tanvi"
b = "Basket Ball"

str = "$a loves playing $b "

"Tanvi loves playing Basket Ball "

In [7]:
a = "Jack"
b = "Jill"
c = "100"

str = "$a owes $b $c dollars"

"Jack owes Jill 100 dollars"

In [8]:
str = "This is a \"quoted\\  ' string"

"This is a \"quoted\\  ' string"

## 7.3 String Methods

Strings are immutable. They cannot be manupulated. String methods combine or work on various strings and return either an attribute of a string or provide a derivative of an original string. 

### Comparisons

In [9]:
s1 = "abc"
s2 = "def"
s1 < s2

true

In [10]:
s2 > s1

true

In [11]:
s1 = "abc"
s2 = "abc"
s1 == s2

true

In [12]:
s1 === s2

true

In [13]:
a1 = "Sin"
a2 = "cos"
s1 < s2

false

In [15]:
a1 === a2

false

### Iteration

Strings can be iterated as character collections. But, valid indices are only at the character boundaries. 

In [16]:
s = "Julia"
for c in s
    println(c)
end

J
u
l
i
a


In [17]:
s[1], s[2], s[3], s[4], s[5] 

('J', 'u', 'l', 'i', 'a')

In [18]:
s[begin], s[begin+2], s[end-1], s[end]

('J', 'l', 'i', 'a')

In [19]:
s = "\u2200 x \u2203 y"

"∀ x ∃ y"

In [20]:
length(s)

7

In [21]:
sizeof(s)

11

In [22]:
s[1]

'∀': Unicode U+2200 (category Sm: Symbol, math)

In [17]:
s[2]

LoadError: StringIndexError: invalid index [2], valid nearby indices [1]=>'∀', [4]=>' '

In [23]:
s[4]

' ': ASCII/Unicode U+0020 (category Zs: Separator, space)

In [24]:
for c in s
    println(c)
end

∀
 
x
 
∃
 
y


In [25]:
i, l = firstindex(s), lastindex(s)
while i <= l
    println(s[i])
    i = nextind(s, i)
end

∀
 
x
 
∃
 
y


### Split and Concatenate

Both sets of operations return a newly defined string. The old string is not modified. 

In [26]:
str = "This is a String"
str[1:4]

"This"

In [27]:
str[1:4]*str[end-6:end]

"This String"

In [28]:
repeat("A:-", 5)

"A:-A:-A:-A:-A:-"

In [29]:
"A:="^4

"A:=A:=A:=A:="

In [30]:
join(["1", "2", "3", "4", "5"])

"12345"

In [31]:
join(["Jack", "Jill", "Cathy", "Trevor"], ", ", " and ")

"Jack, Jill, Cathy and Trevor"

In [32]:
str = "This is a\nString\n"
chomp(str)

"This is a\nString"

In [33]:
chop("October")

"Octobe"

In [34]:
chop("October", head=2, tail=3)

"to"

In [35]:
s = "\u2200 x \u2203 y"
ss = split(s)

4-element Vector{SubString{String}}:
 "∀"
 "x"
 "∃"
 "y"

In [36]:
s = "\u2200,x,\u2203,y"
ss = split(s, ',', limit=2)

2-element Vector{SubString{String}}:
 "∀"
 "x,∃,y"

In [37]:
s = "\u2200,x,\u2203,y"
ss = rsplit(s, ',', limit=2)

2-element Vector{SubString{String}}:
 "∀,x,∃"
 "y"

In [38]:
lpad("string", 10, "p")

"ppppstring"

In [39]:
rpad("string", 10, "s")

"stringssss"

In [40]:
strip("     string 123  ")

"string 123"

In [41]:
strip(" {a}     string 123  ", ['{', 'a', '}', ' '])

"string 123"

In [42]:
strip("     string 123  aaa") do x
    return x == ' ' || x == 'a'
end

"string 123"

### Case Conversion

In [43]:
uppercase("Julia")

"JULIA"

In [44]:
lowercase("JUliA")

"julia"

In [45]:
titlecase("hands on programming in julia")

"Hands On Programming In Julia"

In [46]:
uppercasefirst("julia")

"Julia"

In [47]:
lowercasefirst("Julia")

"julia"

In [48]:
uppercasefirst("tanvi gorantla")

"Tanvi gorantla"

### Match and Replace

In [49]:
str = "Introduction to Julia"
startswith(str, "Intro")

true

In [50]:
endswith(str, "Julia")

true

In [51]:
contains(str, "to")

true

In [52]:
occursin("to", str)

true

In [53]:
r = findfirst("o", "Introduction to Julia")
while r !== nothing 
    println(r)
    r = findnext("o", "Introduction to Julia", r.stop+1)
end

5:5
11:11
15:15


In [54]:
findlast("o", "Introduction to Julia")

15:15

In [55]:
replace("Introduction to Julia", "o"=>"a")

"Intraductian ta Julia"

In [57]:
replace("Introduction to Julia", "n"=>"m")

"Imtroductiom to Julia"

#### Regular Expressions

Regular expressions are part of text pattern matching languages. Readers are suggested to refer to a text on the specific topic for a detailed understanding of them. 

In [58]:
rx = Regex("a.a")

r"a.a"

In [59]:
m = match(rx, "abracadabra")

RegexMatch("aca")

In [60]:
m.match

"aca"

In [61]:
m = match(rx, "abracadabra", 5)

RegexMatch("ada")

In [62]:
rx = Regex("a(.)a")
m = match(rx, "abracadabra")
m.captures

1-element Vector{Union{Nothing, SubString{String}}}:
 "c"

In [63]:
rx = Regex("a(?<key>.)a")
m = match(rx, "abracadabra")
m.captures

1-element Vector{Union{Nothing, SubString{String}}}:
 "c"

In [64]:
m["key"]

"c"

In [65]:
rx = r"a.a"
m = eachmatch(rx, "abracadabra", overlap=true)

Base.RegexMatchIterator(r"a.a", "abracadabra", true)

In [66]:
collect(m)

2-element Vector{RegexMatch}:
 RegexMatch("aca")
 RegexMatch("ada")

In [67]:
m = eachmatch(rx, "abracadabra", overlap=false)

Base.RegexMatchIterator(r"a.a", "abracadabra", false)

In [68]:
collect(m)

1-element Vector{RegexMatch}:
 RegexMatch("aca")

## 7.4 Encodings

`String` objects are internally stored in the UTF-8 encoding. However, they can be translated to or from other Unicode transformations like UTF-16 or UTF-32. 

In [69]:
s = "\u2200 x \u2203 y"

"∀ x ∃ y"

In [70]:
transcode(UInt16, s)

7-element Vector{UInt16}:
 0x2200
 0x0020
 0x0078
 0x0020
 0x2203
 0x0020
 0x0079

In [71]:
transcode(UInt8, s)

11-element Base.CodeUnits{UInt8, String}:
 0xe2
 0x88
 0x80
 0x20
 0x78
 0x20
 0xe2
 0x88
 0x83
 0x20
 0x79

In [72]:
transcode(UInt32, s)

7-element Vector{UInt32}:
 0x00002200
 0x00000020
 0x00000078
 0x00000020
 0x00002203
 0x00000020
 0x00000079

In [73]:
transcode(String, transcode(UInt16, s))

"∀ x ∃ y"

### Some Useful Functions

In [74]:
isascii("∀ x ∃ y"), isascii("abcd ef")

(false, true)

In [75]:
iscntrl('a'), iscntrl('\x1')

(false, true)

In [76]:
isdigit('a'), isdigit('9')

(false, true)

In [77]:
isxdigit('a'), isxdigit('x')

(true, false)

In [78]:
isletter('1'), isletter('a')

(false, true)

In [79]:
isnumeric('1'), isnumeric('௰') #No 10 in Tamil (Indian) Language

(true, true)

In [80]:
isuppercase('A'), islowercase('a')

(true, true)

In [81]:
isspace('\n'), isspace('\r'), isspace(' '), isspace('\x20')

(true, true, true, true)

## 7.5 Character Arrays

If you need to manipulate character by character, then it may be best to transform a `String` into an `Vector{Char}`. 

In [82]:
collect("∀ x ∃ y")

7-element Vector{Char}:
 '∀': Unicode U+2200 (category Sm: Symbol, math)
 ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
 'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
 ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
 '∃': Unicode U+2203 (category Sm: Symbol, math)
 ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
 'y': ASCII/Unicode U+0079 (category Ll: Letter, lowercase)

## 7.6 Custom Strings

If Unicode based `String` type does not meet all your needs, you may have to implement your own string type deriving it from `AbstractString`. If the character code you are planning to use does not map to a UTF-8 `Char` you can create your own character type derived from `AbstractChar`. `LegacyStrings.jl` package in Julia has some sample implementations of such string types for reference. 

In [83]:
eltype("abcd")

Char

The subsequent command may take many minutes to complete if your environment has never been updated. 

In [84]:
]add LegacyStrings

[32m[1m    Updating[22m[39m registry at `C:\Users\smile\.julia\registries\General`
[32m[1m    Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\smile\OneDrive\Desktop\Tanvi\Woxen\Classes\Term 1\Python and Julia\Julia\Hands-on-Julia-Programming\Chapter 07\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\smile\OneDrive\Desktop\Tanvi\Woxen\Classes\Term 1\Python and Julia\Julia\Hands-on-Julia-Programming\Chapter 07\Manifest.toml`


In [85]:
using LegacyStrings

In [86]:
s = ASCIIString("abcd")

"abcd"

In [87]:
ncodeunits(s)

4

In [88]:
codeunit(s)

UInt8

In [89]:
s16 = UTF16String(transcode(UInt16, "abcd\0"))

"abcd"

In [90]:
codeunit(s16)

UInt16

In [91]:
typeof(s16)

UTF16String

In [92]:
ncodeunits(s16)

4

Both `UTF16String` and `ASCIIString` will behave like collections of `Char` while internally they will store the data in 16-bit and 8-bit formats respectively. Hence,  it's not necessary every string class derived from `AbstractString` needs to implement an `AbstractChar`.

In [93]:
eltype(s), eltype(s16)

(Char, Char)