In [4]:
cpp="""In C++, a struct (short for structure) is a user-defined data type that groups related variables (of different data types) under one name. It is similar to a class 
but has some key differences. By default, the members of a struct are public, whereas in a class, they are private.
Members are public by default
Unlike classes, struct members do not require an access specifier to be accessed outside the struct.

Can have member functions
A struct in C++ can contain both data members and functions.

Struct vs Class

struct is typically used for simple data structures, whereas class is preferred for complex data encapsulation and object-oriented programming.

In a struct, members are public by default, whereas in a class, they are private by default.

What is static in C++?
In C++, the static keyword is used to define variables or functions that belong to the class itself, rather than to any specific object of the class. This means that:

A static variable retains its value across multiple function calls.

A static function can be called without creating an object of the class.
    

Why use static?
Memory Efficiency: Static members are stored in a fixed location in memory (not duplicated for every object).

Access Without an Object: You can call a static method or access a static variable without creating an instance of the class.



Shared Data: All objects of the class share the same static member, making it useful for counters, constants, or utility functions.

static is useful for tracking shared data (e.g., total accounts, user count, object instances).
static functions can be called without creating an object.
It reduces memory usage by ensuring only one copy of the variable exists."""

In [2]:
# step 1: tokenize the text
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [5]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts([cpp])

In [6]:
tokenizer.word_index

{'a': 1,
 'static': 2,
 'class': 3,
 'the': 4,
 'in': 5,
 'struct': 6,
 'is': 7,
 'data': 8,
 'of': 9,
 'object': 10,
 'for': 11,
 'members': 12,
 'are': 13,
 'to': 14,
 'by': 15,
 'an': 16,
 'can': 17,
 'functions': 18,
 'c': 19,
 'default': 20,
 'without': 21,
 'that': 22,
 'it': 23,
 'public': 24,
 'whereas': 25,
 'access': 26,
 'be': 27,
 'or': 28,
 'variable': 29,
 'creating': 30,
 'memory': 31,
 'user': 32,
 'variables': 33,
 'one': 34,
 'they': 35,
 'private': 36,
 'not': 37,
 'member': 38,
 'and': 39,
 'used': 40,
 'function': 41,
 'called': 42,
 'shared': 43,
 'useful': 44,
 'short': 45,
 'structure': 46,
 'defined': 47,
 'type': 48,
 'groups': 49,
 'related': 50,
 'different': 51,
 'types': 52,
 'under': 53,
 'name': 54,
 'similar': 55,
 'but': 56,
 'has': 57,
 'some': 58,
 'key': 59,
 'differences': 60,
 'unlike': 61,
 'classes': 62,
 'do': 63,
 'require': 64,
 'specifier': 65,
 'accessed': 66,
 'outside': 67,
 'have': 68,
 'contain': 69,
 'both': 70,
 'vs': 71,
 'typically'

In [8]:
# step2 : find the sentences in the text

for sentence in cpp.split('\n'):
    print(sentence.strip())

In C++, a struct (short for structure) is a user-defined data type that groups related variables (of different data types) under one name. It is similar to a class
but has some key differences. By default, the members of a struct are public, whereas in a class, they are private.
Members are public by default
Unlike classes, struct members do not require an access specifier to be accessed outside the struct.

Can have member functions
A struct in C++ can contain both data members and functions.

Struct vs Class

struct is typically used for simple data structures, whereas class is preferred for complex data encapsulation and object-oriented programming.

In a struct, members are public by default, whereas in a class, they are private by default.

What is static in C++?
In C++, the static keyword is used to define variables or functions that belong to the class itself, rather than to any specific object of the class. This means that:

A static variable retains its value across multiple f

In [9]:
# step3 : convert the text to sequences
for sentence in cpp.split('\n'):
    sequence = tokenizer.texts_to_sequences([sentence])
    print(f"Sentence: {sentence.strip()}")
    print(f"Sequence: {sequence[0]}")
    print("----")

Sentence: In C++, a struct (short for structure) is a user-defined data type that groups related variables (of different data types) under one name. It is similar to a class
Sequence: [5, 19, 1, 6, 45, 11, 46, 7, 1, 32, 47, 8, 48, 22, 49, 50, 33, 9, 51, 8, 52, 53, 34, 54, 23, 7, 55, 14, 1, 3]
----
Sentence: but has some key differences. By default, the members of a struct are public, whereas in a class, they are private.
Sequence: [56, 57, 58, 59, 60, 15, 20, 4, 12, 9, 1, 6, 13, 24, 25, 5, 1, 3, 35, 13, 36]
----
Sentence: Members are public by default
Sequence: [12, 13, 24, 15, 20]
----
Sentence: Unlike classes, struct members do not require an access specifier to be accessed outside the struct.
Sequence: [61, 62, 6, 12, 63, 37, 64, 16, 26, 65, 14, 27, 66, 67, 4, 6]
----
Sentence: 
Sequence: []
----
Sentence: Can have member functions
Sequence: [17, 68, 38, 18]
----
Sentence: A struct in C++ can contain both data members and functions.
Sequence: [1, 6, 5, 19, 17, 69, 70, 8, 12, 39, 18]

In [10]:
# step 4 : convert the sequence to dataset
input_sequences = []
for sentence in cpp.split('\n'):
    sequence = tokenizer.texts_to_sequences([sentence])[0]
    
    for i in range(1, len(sequence)):
        n_gram_sequence = sequence[:i+1]
        input_sequences.append(n_gram_sequence)


In [11]:
input_sequences[:5]  # Display the first 5 sequences

[[5, 19], [5, 19, 1], [5, 19, 1, 6], [5, 19, 1, 6, 45], [5, 19, 1, 6, 45, 11]]

In [13]:
max([len(x) for x in input_sequences])  # Check the lengths of the sequences

30

In [15]:
# steo 5 : pad the sequences
from tensorflow.keras.preprocessing.sequence import pad_sequences

padded_input=pad_sequences(input_sequences, maxlen=30, padding='pre')

In [16]:
padded_input

array([[  0,   0,   0, ...,   0,   5,  19],
       [  0,   0,   0, ...,   5,  19,   1],
       [  0,   0,   0, ...,  19,   1,   6],
       ...,
       [  0,   0,   0, ..., 128,   9,   4],
       [  0,   0,   0, ...,   9,   4,  29],
       [  0,   0,   0, ...,   4,  29, 129]])

In [17]:
X=padded_input[:,:-1]  # Features (input sequences)
y=padded_input[:,-1]  # Labels (next words)

In [18]:
X

array([[  0,   0,   0, ...,   0,   0,   5],
       [  0,   0,   0, ...,   0,   5,  19],
       [  0,   0,   0, ...,   5,  19,   1],
       ...,
       [  0,   0,   0, ...,  34, 128,   9],
       [  0,   0,   0, ..., 128,   9,   4],
       [  0,   0,   0, ...,   9,   4,  29]])

In [19]:
y

array([ 19,   1,   6,  45,  11,  46,   7,   1,  32,  47,   8,  48,  22,
        49,  50,  33,   9,  51,   8,  52,  53,  34,  54,  23,   7,  55,
        14,   1,   3,  57,  58,  59,  60,  15,  20,   4,  12,   9,   1,
         6,  13,  24,  25,   5,   1,   3,  35,  13,  36,  13,  24,  15,
        20,  62,   6,  12,  63,  37,  64,  16,  26,  65,  14,  27,  66,
        67,   4,   6,  68,  38,  18,   6,   5,  19,  17,  69,  70,   8,
        12,  39,  18,  71,   3,   7,  72,  40,  11,  73,   8,  74,  25,
         3,   7,  75,  11,  76,   8,  77,  39,  10,  78,  79,   1,   6,
        12,  13,  24,  15,  20,  25,   5,   1,   3,  35,  13,  36,  15,
        20,   7,   2,   5,  19,  19,   4,   2,  81,   7,  40,  14,  82,
        33,  28,  18,  22,  83,  14,   4,   3,  84,  85,  86,  14,  87,
        88,  10,   9,   4,   3,  89,  90,  22,   2,  29,  91,  92,  93,
        94,  95,  41,  96,   2,  41,  17,  27,  42,  21,  30,  16,  10,
         9,   4,   3,  98,   2,  99,   2,  12,  13, 100,   5,   

In [20]:
X.shape

(265, 29)

In [21]:
y.shape

(265,)

In [22]:
# step 6: convert the labels to one-hot encoding
from tensorflow.keras.utils import to_categorical

y = to_categorical(y, num_classes=len(tokenizer.word_index) + 1)

In [23]:
y.shape

(265, 130)

In [24]:
y

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [27]:
# step 7: create the model RNN
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense,LSTM

In [28]:
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=10, input_length=X.shape[1]))
model.add(LSTM(100))   
model.add(Dense(len(tokenizer.word_index) + 1, activation='softmax'))

In [29]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [31]:
model.summary()

In [32]:
model.fit(X, y, epochs=100, verbose=1)

Epoch 1/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 16ms/step - accuracy: 0.0165 - loss: 4.8667
Epoch 2/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.0375 - loss: 4.8515
Epoch 3/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.0329 - loss: 4.7470
Epoch 4/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.0276 - loss: 4.5933
Epoch 5/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.0364 - loss: 4.4798
Epoch 6/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0305 - loss: 4.5225
Epoch 7/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.0472 - loss: 4.4838
Epoch 8/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.0360 - loss: 4.5376
Epoch 9/100
[1m9/9[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[3

<keras.src.callbacks.history.History at 0x1e23bf6fc20>

In [35]:
text = "struct"

# tokenize the input text
sequence = tokenizer.texts_to_sequences([text])[0]

# pad the sequence
padded_sequence = pad_sequences([sequence], maxlen=30, padding='pre')

# predict the next word
predicted = model.predict(padded_sequence, verbose=0)

In [36]:
predicted.shape

(1, 130)

In [37]:
predicted

array([[6.29430906e-06, 3.79857048e-02, 7.75526389e-02, 8.86314735e-03,
        1.10464655e-02, 4.32707630e-02, 6.73732087e-02, 6.52578771e-02,
        8.34724028e-03, 2.65747984e-03, 8.85247812e-03, 3.35576758e-03,
        3.50720808e-02, 4.70448807e-02, 8.16076645e-04, 1.33979991e-02,
        7.72563275e-03, 9.16553568e-03, 2.23168731e-02, 7.29017109e-02,
        8.03470984e-03, 1.05555849e-02, 1.14927425e-04, 2.83351430e-04,
        2.25590747e-02, 2.87707546e-03, 1.04474777e-03, 4.68476070e-03,
        2.98431143e-04, 1.75480768e-02, 6.38871454e-04, 1.66324768e-02,
        1.50563181e-04, 1.32220288e-04, 3.73655581e-04, 2.41061277e-03,
        1.31749338e-03, 1.90142915e-03, 1.24828657e-02, 2.74327613e-04,
        1.16656965e-03, 1.13506326e-02, 6.17087935e-04, 1.87613143e-04,
        6.71785884e-03, 5.44627802e-03, 4.56092414e-04, 2.16714441e-04,
        2.02966548e-04, 8.21999129e-05, 5.43642636e-05, 4.61180207e-05,
        4.50699590e-05, 2.71948375e-05, 1.50392123e-04, 1.023372

In [38]:
import numpy as np
predicted_word_index = np.argmax(predicted, axis=-1)[0]

In [39]:
predicted_word_index

2

In [41]:
for word, index in tokenizer.word_index.items():
    if index == predicted_word_index:
        print(f"Predicted word: {word}")
        break

Predicted word: static


In [49]:
text = "struct is typically used for simple data structures"
import time
for i in range(5):
    
    # tokenize the new text
    sequence = tokenizer.texts_to_sequences([text])[0]
    
    # pad the sequence
    padded_sequence = pad_sequences([sequence], maxlen=30, padding='pre')
    
    # predict the next word
    
    
    predicted_word_index = np.argmax(model.predict(padded_sequence))
    for word, index in tokenizer.word_index.items():
        if index == predicted_word_index:
            text += " " + word
            print(text)
            time.sleep(2)
            

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 58ms/step
struct is typically used for simple data structures g
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 27ms/step
struct is typically used for simple data structures g g
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
struct is typically used for simple data structures g g preferred
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
struct is typically used for simple data structures g g preferred preferred
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
struct is typically used for simple data structures g g preferred preferred preferred


In [50]:
# how to imporve the model
# 1. Increase the number of epochs
# 2. More data
# 3.hyperparameter tuning
# 4. Use a more complex model (e.g., stacked,bidrectional,,GPT,BERTLSTM, GRU, Transformer)