# Lexer generation in Python

Consider the `gen_tokenizer.py` Python program provided in the course's resources. It generates a Python tokenizer from a JSON file containing token types and associated regular expressions.

Write JSON files with token definitions to generate lexers and solve the following exercises.

# Exercise 1

Consider the [SQL](https://en.wikipedia.org/wiki/SQL) language use to manage data in relational databases. Consider the following SQL instructions as examples.

```sql
CREATE TABLE users (
  id INTEGER PRIMARY KEY,
  name TEXT,
  age INTEGER
);
```

```sql
INSERT INTO users (name, age) VALUES ('Alice', 30);
```

```sql
SELECT name FROM users WHERE age > 25;
```

Generate a lexical analyzer for a fragment of the SQL query language using the provided lexer generator.

# Exercise 2

Consider the [C](https://en.wikipedia.org/wiki/C_(programming_language)) general-purpose programming language. Consider the following C code snippets as examples.

```c
#include <stdio.h>

int main()
{
    printf("hello, world\n");
}
```

```c
#include <stdio.h>

int main(int argc, char **argv) {
    int a = 5, b = 3;
    int sum = a + b;

    printf("Sum: %d\n", sum);
    return 0;
}
```

Generate a lexical analyzer for a fragment of the C programming language using the provided lexer generator.

# Lexer generation with `Ply.lex`

Use the lexical analyzer provided by `Ply.lex` to solve the following exercises in Python.

# Exercise 3

Recall exercise 1 that was solved with the provided lexer generator. Solve it now using the `Ply.lex` module.

Consider the SQL query language and the following instructions.


```sql
CREATE TABLE users (
  id INTEGER PRIMARY KEY,
  name TEXT,
  age INTEGER
);
```

```sql
INSERT INTO users (name, age) VALUES ('Alice', 30);
```

```sql
SELECT name FROM users WHERE age > 25;
```

## Exercise 3.1

Generate a lexical analyzer for a fragment of the SQL query language using `Ply.lex`.

## Exercise 3.2

Write a function that, using the lexer, identifies and returns all occurrences of identifiers, literal integers, and literal strings.

For instance, for the examples above:
```python
{
  "id": ["id", "name", "age", "users"],
  "string": ["Alice"],
  "int": [30, 25]
}
```

# Exercise 4

Recall exercise 2 that was solved with the provided lexer generator. Solve it now using the `Ply.lex` module.

Consider the C programming language and the following code snippets.

```c
#include <stdio.h>

int main() {

    printf("Hello World");

    return 0;
}
```

```c
#include <stdio.h>

int main(int argc, char **argv) {
    int a = 5, b = 3;
    int sum = a + b;

    printf("Sum: %d\n", sum);
    return 0;
}
```

## Exercise 4.1

Generate a lexical analyzer for a fragment of the C programming language using `Ply.lex`.

## Exercise 4.2

Write a function that, using the lexer, identify all identifiers that occur in a snippet of C code.

For instance, for the first example above, it should return `["main", "printf"]`, and for the second `["main", "printf", "argc", "argv", "a", "b", "sum"]`.

## Exercise 4.3

Adapt the previous exercise so that each identifier occurrence also reports the *line* in which it occurs.

For instance, for the first example above, it should return `[("main",[3]), )("printf",[5])`, and for the second `[("main",[3]), ("printf",[7]), ("argc",[3]), ("argv",[3]), ("a",[4,5]), ("b",[4,5]), ("sum",[5,7])`.

## Exercise 4.4

Consider now that the code can also contain single-line and block *comments*. Report all occurring identifiers and the line they occur in.

For the example below, it should return `[("main",8),("printf",11)]`.
```c
// Import input/output functions
#include <stdio.h>

/*
Main function:
- says hello
- returns
*/
int main() {

    // printf is used to produce output
    printf("Hello World");

    return 0;
}
```

# Exercise 5

Consider the [XML](https://en.wikipedia.org/wiki/XML) markup language used to store and transmit data. Use the lexer for XML generated using `Ply.lex` provided in the course's resources to solve the following exercises.

## Exercise 5.1

Write a function to convert an XML file into a Python dictionary and dump it in JSON. For instance, for the following XML:
```xml
<pessoa>
    <nome>Maria</nome>
    <idade>32</idade>
</pessoa>
<pessoa>
    <nome>Manuel</nome>
    <idade>53</idade>
</pessoa>
```

You should produce the dictionary:
```python
{ "pessoa" : [ { "nome" : "Maria", "idade" : 32 }, { "nome" : "Manuel", "idade" : 53 } ]
```