Welcome to chapter one of Methods in Medical Informatics! In this section, we will be exploring how to parse and transform text files. We will be exploring seven different scripts which each illustrate  a different aspect of parsing and transforming text files. Lets begin!

> Note: The content below is adapted from the book "Methods in Medical Informatics - Fundamental of Healthcare Programming in Perl, Python, and Ruby" by Jules J. Berman. 

# Random Numbers

Random numbers are used extensively in Monte Carlo simulations of biological events. The simulations are also used in statistics (ie. calculating normal distributions), and can even provide simple computational approaches to formal mathematical problems.

In [2]:
import random
for iterations in range(10):
    print(random.uniform(0,1))
exit()

0.4747047349331942
0.9476487505258883
0.23738329526226598
0.6006232213759787
0.3652683651977562
0.3402655835440779
0.002176777400353269
0.9138494543979068
0.08339174189045429
0.9662140744087262


## Script Algorithm

# Converting Non-ASCII to Base64 ASCII

ALmost every computer user has made the mistake of trying to view a non-SCII file (such as a binary image, or a word-processed file stored in a proprietary format) in a plain-text viewer. Python contain standard modules that will convert any file into BASE64. We will be using the BASE64 modules when we start working with image data conveyed in XML files. 

In [1]:
import base64
sample_file = open('latlon.bin', 'rb')
string = sample_file.read()
sample_file.close()
print(base64.encodebytes(string))
print(base64.encodebytes(base64.encodebytes(string)))
exit()

b'Qih1jkIoeA5CKHpuQih87kIof25CKIHuQiiETkIohs5CKIkuQiiLjkIoje5CKJBOQiiSrkIolO5C\nKJdOQiiZjkIom+5CKJ4uQiigbkIooq5CKKTuQiinLkIoqW5CKKuOQiitzkIor+5CKLIuQii0TkIo\ntm5CKLiOQii6jkIovK5CKL7OQijAzkIowu5CKMTuQijG7kIoyO5CKMruQijM7kIozs5CKNDOQijS\nzkIo1K5CKNaOQijYbkIo2k5CKNwuQijeDkIo3+5CKOGuQijjjkIo5U5CKOcOQijo7kIo6q5CKOxO\nQijuDkIo785CKPGOQijzLkIo9M5CKPaOQij4LkIo+c5CKPtuQij87kIo/o5CKQAuQikBrkIpAy5C\nKQTOQikGTkIpB85CKQlOQikKrkIpDC5CKQ2uQikPDkIpEG5CKRHuQikTTkIpFK5CKRYOQikXTkIp\nGK5CKRoOQikbTkIpHI5CKR3OQikfLkIpIG5CKSGOQikizkIpJA5CKSUuQikmbkIpJ45CKSiuQikp\nzkIpKu5CKSwOQiktLkIpLi5CKS9OQikwTkIpMW5CKTJuQikzbkIpNG5CKTVuQik2TkIpN05CKTgu\nQik5LkIpOg5CKTruQik7zkIpPK5CKT2OQik+bkIpPy5CKUAOQilAzkIpQY5CKUJuQilDLkIpQ85C\nKUSOQilFTkIpRg5CKUauQilHTkIpSA5CKUiuQilJTkIpSe5CKUpuQilLDkIpS65CKUwuQilMrkIp\nTU5CKU3OQilOTkIpTs5CKU8uQilPrkIpUA5CKVCOQilQ7kIpUU5CKVHOQilSDkIpUm5CKVLOQilT\nLkIpU25CKVPOQilUDkIpVE5CKVSOQilUzkIpVQ5CKVVOQilVbkIpVa5CKVXOQilV7kIpVi5CKVZO\nQilWbkIpVm5CKVaOQilWrkIpVq5CKVbOQilWzkIpVs5CKVbOQilWzkIpVs5CKV

# Creating a Universally Unique Identifier

In [10]:
import uuid
print(uuid.uuid4())
exit()

4ccbf33b-aa68-4de2-b9c3-37213a72c17c


# Splitting Text into Sentences

In [2]:
import re
all_text = 'I am here. You are here. We are all here.'
sentence_list = re.split(r'[\.\!\?] +(?=[A-Z])', all_text)
print('\n'.join(sentence_list))

I am here
You are here
We are all here.


# One-Way Hash on a Name

In [1]:
import sys
import string
import hashlib
line = input('What is your full name?\n')
line = line.encode('utf-8')
md5_object = hashlib.md5()
md5_object.update(line)
print(md5_object.hexdigest())
exit()

What is your full name?
Eric Roger Cui
e13005d38bcf61aef5458a88362cb245


# One Way Hash on a File

In [1]:
import hashlib
import string
md5_object = hashlib.md5()
sample_file = open('us.gif', 'rb')
string = sample_file.read()
sample_file.close()
md5_object.update(string)
md5_string = md5_object.digest()
print(''.join([ '%02X' % x for x in md5_string]).strip())
exit()

39842F5ED1516D7C541155FD2B093B36


# Prime Number Generator

In [10]:
import math
print('2,3,')
state = 1
for i in range(4, 10000):
    upper = math.sqrt(i)
    upper = int(upper)
    for thing in range(2, upper):
        state = 1
        if (i % thing == 0):
            state = 0
            break
    if (state == 1):
        print(i,)
exit()

2,3,
4
5
6
7
8
9
11
13
15
17
19
23
25
29
31
35
37
41
43
47
49
53
59
61
67
71
73
79
83
89
97
101
103
107
109
113
121
127
131
137
139
143
149
151
157
163
167
169
173
179
181
191
193
197
199
211
223
227
229
233
239
241
251
257
263
269
271
277
281
283
289
293
307
311
313
317
323
331
337
347
349
353
359
361
367
373
379
383
389
397
401
409
419
421
431
433
439
443
449
457
461
463
467
479
487
491
499
503
509
521
523
529
541
547
557
563
569
571
577
587
593
599
601
607
613
617
619
631
641
643
647
653
659
661
673
677
683
691
701
709
719
727
733
739
743
751
757
761
769
773
787
797
809
811
821
823
827
829
839
841
853
857
859
863
877
881
883
887
899
907
911
919
929
937
941
947
953
961
967
971
977
983
991
997
1009
1013
1019
1021
1031
1033
1039
1049
1051
1061
1063
1069
1087
1091
1093
1097
1103
1109
1117
1123
1129
1151
1153
1163
1171
1181
1187
1193
1201
1213
1217
1223
1229
1231
1237
1249
1259
1277
1279
1283
1289
1291
1297
1301
1303
1307
1319
1321
1327
1361
1367
1369
1373
1381
1399
1409
1423
1427
1429
1

# Exercises