# Notes on How to write a memory efficient Python program?
https://medium.com/datadriveninvestor/how-does-memory-allocation-work-in-python-and-other-languages-d2d8a9398543

## I. Measure memory in Jupyter
1. %memit magic command
https://timothymonteath.com/articles/monitoring_memory_usage/ <br>
pip install memory_profiler

%memit magic command that lets us benchmark the memory used by a single Python statement <br>
%load_ext memory_profiler
%%memit import numpy as np

2. resource built-in module
https://docs.python.org/2/library/resource.html

3. sys.getsizeof 
https://docs.python.org/3/library/sys.html#sys.getsizeof

In [6]:
%load_ext memory_profiler
%memit a = range(0, 10)

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler
peak memory: 51.91 MiB, increment: 0.00 MiB


In [7]:
a = range(0, 10)
print(a)
print(type(a))

range(0, 10)
<class 'range'>


In [8]:
import sys
sys.getsizeof(a)

48

## Practice 1: Try Not To Blow Off Memory!
https://www.codementor.io/satwikkansal/python-practices-for-efficient-code-performance-memory-and-usability-aze6oiq65 <br>
Unlike in C/C++, Python’s interpreter performs the memory management and users have no control over it. However, greater insight into how things work and different ways to do things can help you minimize your program's memory usage.
* Use generators to calculate large sets of results
https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/ <br>
https://www.freecodecamp.org/news/how-and-why-you-should-use-python-generators-f6fb56650888/
* Don't use + for generating long strings, use ''.join(iterable_object)
* Use slots when defining a Python class
https://stackoverflow.com/questions/472000/usage-of-slots
* Python Idioms and efficiency
https://www.memonic.com/user/pneff/folder/python/id/1bufp
https://google.github.io/styleguide/pyguide.html
* use built-in function whenever possible
*  create smaller functions so each variable has a shorter lifetime between creation and being dereferenced when the namespace is removed at function exit.
* calling gc.collect() yourself at the end of a loop can help avoid fragmenting memory, which in turn helps keep performance up. I've seen this make a significant difference (~20% runtime IIRC)
* Some methods for reduce memory
https://habr.com/en/post/458518/
* memory of usage: np.array < Tuple < List
* memory of usage: list-of-arrays < array-of-arrays

In [30]:
# list vs tuple vs np.array
a = list(range(20))
b = tuple(range(20))
c = np.array(range(20))
print(type(a), type(b), type(c))
print(sys.getsizeof(a))
print(sys.getsizeof(b))
print(sys.getsizeof(c))
print(a)
print(b)
print(c)
print(a[2])
print(b[2])

<class 'list'> <class 'tuple'> <class 'numpy.ndarray'>
288
208
176
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
2
2


In [42]:
# array-of-arrays VS. list-of-arrays VS. list-of-lists
a = np.array([np.array(range(3)), np.array(range(5))])
b = list([np.array(range(3)), np.array(range(5))])
c = list([list(range(3)), list(range(5))])
print(a)
print(b)
print(c)
print(type(a), type(b), type(c))
print(sys.getsizeof(a))
print(sys.getsizeof(b))
print(sys.getsizeof(c))

[array([0, 1, 2]) array([0, 1, 2, 3, 4])]
[array([0, 1, 2]), array([0, 1, 2, 3, 4])]
[[0, 1, 2], [0, 1, 2, 3, 4]]
<class 'numpy.ndarray'> <class 'list'> <class 'list'>
112
104
104


### Note with list
* Avoid List Slicing
* try to use “for item in array” for loops over arrays, before using “for index in range(len(array))” to save space and time.

### 1. do not create a huge string as the full file body to be written at once, use "writelines"

In [None]:
# use writelines
f = open(filename, 'w')
f.writelines((datum + os.linesep) for datum in data)
f.close()

#Even better, you could write the file as:
items = GetData(url)
f = open(filename, 'w')
for item in items:
    f.write(';'.join(item) + os.linesep)
f.close()

### 2. Use slots when defining a Python class

In [14]:
import sys, os, glob, time, re
import numpy as np
sys.path.append("D:/work/Tool_codes/Tools_Python/FinalCode_py")
import thaFileLammps, thaTool, thaModel

# #### read DATA/Dump
du = thaFileLammps.DUMP()
du.fReadDump('SRO_propa_Al_z001_810K_1000.cfg')

In [15]:
sys.getsizeof(du)

56

# II. Using Generator
* Iterables are any objects you can get an iterator from (list, string, dictionaries, tuples, sets,...), or iterables are objects with a built-in protocol for visiting each element in a certain order.
* Iterators are objects that let you iterate on iterables.
* Generator is a lazy "pending list" object. A generator is iterable.
* yield statement to make "function return" as generator
https://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do?page=1&tab=votes#tab-top

generator = myYieldingFunction(...)
x = list(generator)

   generator
       v
[x[0], ..., ???]

         generator
             v
[x[0], x[1], ..., ???]

               generator
                   v
[x[0], x[1], x[2], ..., ???]

                       StopIteration exception
[x[0], x[1], x[2]]     done

list==[x[0], x[1], x[2]]

## II.1. eager eavaluation VS. lazy evaluation (return vs. yield)
* Use a list:  will return all element of list, so memory depend all len-of-list
* Use generator: size of generator is fix, independent on len-of-list
https://anandology.com/python-practice-book/iterators.html

In [18]:
lista = [1, 2, 3]
listb = [x*x for x in range(300000)]
mygenerator = (x*x for x in range(300000))
print(lista)
# print(listb)
print(mygenerator)

[1, 2, 3]
<generator object <genexpr> at 0x00000255D9F8F8C8>


In [19]:
import sys
print(sys.getsizeof(listb))
print(sys.getsizeof(mygenerator))

2678096
120


## II.2. generator function (yield)
* "Return" sends a specified value back to its caller whereas "Yield" can produce a sequence of values. We should use yield when we want to iterate over a sequence, but don’t want to store the entire sequence in memory. <br>
* Yield are used in Python generators. A generator function is defined like a normal function, but whenever it needs to generate a value, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function.

In [20]:
def simpleGeneratorFun(): 
    yield 1
    yield 2
    yield 3
  
# Driver code to check above generator function 
for value in simpleGeneratorFun():  
    print(value) 

1
2
3


## II.3. access item in generator
https://stackoverflow.com/questions/2300756/get-the-nth-item-of-a-generator-in-python
### access whole generator:

In [21]:
simpleGen = (2*x for x in range(5))
for item in simpleGen:
    print(item)   

0
2
4
6
8


In [22]:
simpleGen = (2*x for x in range(5))
for i,item in enumerate(simpleGen):
    print(i, item) 

0 0
1 2
2 4
3 6
4 8


### access 1 item in generator:

In [26]:
from itertools import islice
simpleGen = (2*x for x in range(5))
n=2
next(islice(simpleGen, n, n+1))

4

In [33]:
simpleGen = (2*x for x in range(5))
next(x for i,x in enumerate(simpleGen) if i==1)

2

In [36]:
simpleGen = (2*x for x in range(5))
next(x for i,x in enumerate(simpleGen) if i==3)

6

* Note: A generator cannot be used morethan 1 time. Solution for this is copying generator useing itertools.tee
* But if the number of copys too large, it become list again