# <center>Big Data &ndash; Exercises &amp; Solutions</center>
## <center>Spring 2019 &ndash; Week 4 &ndash; ETH Zurich</center>

## Introduction
This exercise will cover XML and JSON well-formedness.

We recommend you use an online editor to validate and check your solutions for well-formedness, such as [this](https://www.xmlvalidation.com/) for XML and [this](https://jsonlint.com/) for JSON. To edit XML files, something like [this](https://www.webtoolkitonline.com/xml-formatter.html) might work better, which offers syntax highlighting and formatting.  
If you prefer, you can instead install [oXygen](https://www.oxygenxml.com/xml_editor/software_archive_editor.html), an XML/JSON development IDE. You should have received a license key for it by email.


## 1 XML


### 1.1 Well-formedness

For each of the following XML documents, say if it is well-formed or not; if not, then also correct the errors.
An online editor may help with these tasks, but first try to solve them without software support.

#### Document 1
```
<Burger>
    <Bun>
        <Pickles/>
        <Cheese/>
        <Patty/>
    </Bun>
</Burger>
```

#### Document 2
```
<Burger>
    <Bun>
        <Pickles/>
        <Cheese/>
        <Patty/>
    </Bun>
</Burger>
<Cola>
    <Sugar/>
    <Water/>
</Cola>
<Fries kind="French"/>
```

#### Document 3
```
<Email>
    Dear Mr. John Doe,

    I hereby kindly request you to revise the attached draft 
    of the merger contract and provide me with any feedback you may have.

    Best regards,
    Jane Doe
    Legal Department.
</Email>
P.S. Also, please stop sending me funny cat pictures; I laugh all day and cannot work.
```

#### Document 4
```
<Book>
    <!-- Hardcover -->
    <Title>Creating Your Own Empire</Title>
    <Author>John Doe</Author>
    <Year>2019</Year>
    <Publisher>Humble Press Ltd.</Publisher>
</Book>
<!-- I should try it out one day. -->
```

#### Document 5
```
<Book>
    <!-- Hard--cover -->
    <Title>Creating Your Own Empire</Title>
    <Author>John Doe</Author>
    <Year>2019</Year>
    <Publisher>Humble Press Ltd.</Publisher>
</Book>
<!-- I should try it out one day. -->
```

#### Document 6
```
<Band name="Metallica" 
      member="James Hetfield" 
      member="Lars Ulrich" 
      member="Kirk Hammett" 
      member="Robert Trujillo"/>
```

### 1.2 Create your own XML
Copy the text of the introduction above (including the title until the text 'by email') and paste it into an editor as plain text. Create a possible XML document keeping the same context and including formatting (title, sections, style, links, etc.).

**Questions**
1. Is your XML well-formed? If not, correct any mistakes. You can use an editor to help you with this task.
1. Is this data structured, unstructured, or semi-structured?


### 1.3 XML Names
Which of the following are valid XML Names?
1. `<_bar/>`
1. `<Xmlelement/>`
1. `<Foo/>`
1. `<foo123/>`
1. `<foo_123/>`
1. `<foo-123/>`
1. `<foo#123/>`
1. `<foo.123/>`
1. `<-123/>`
1. `<123foo/>`
1. `<doctype/>`

### 1.4 Predefined entities
XML has only 5 predefined entities. Associate each escape code with the corresponding character.
1. `&lt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     >
1. `&amp;`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;           "
1. `&gt;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;     '
1. `&quot;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           &
1. `&apos;` &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           <

## 2. JSON
### 2.1 JSON Values
JSON documents are composed of name&ndash;value pairs. List the 6 possible JSON value types.

### 2.2 JSON well-formedness
For each of the following JSON documents, state if you see any syntax mistakes. If yes, correct them.

#### Document 1
```
{
  "burger" : {
      "bun" : ["pickles", "cheese", "patty"],
      "extraIngredients" :
  }
}
```

#### Document 2
```
{
  "pizza" : {
      "topping" : "salami",
      "topping" : "cheese",
      "topping" : "oregano"
  }
}
```

#### Document 3
```
{
    name : "John Doe",
    age : 42,
    occupation : "Penguin Turner",
    motto : "Up is life!"
}
```

##  3. XML Namespaces

For each element and attribute in the following XML document, state to which namespace it belongs.

```xml
<ff:Burger
    xmlns:geo="http://example.com/geography"
    xmlns:spi="http://example.com/food/spices"
    xmlns:meat="http://example.com/food/meat"
    xmlns:veg="http://example.com/food/vegetables"
    xmlns:dai="http://example.com/food/dairy"
    xmlns:ff="http://example.com/food/fast">
  <ff:Bun remarks="gluten-free">
      <dai:Cheese dai:remarks="Made of free-range cow milk"/>
      <veg:Tomatoes/>
      <ff:Patty>
          <meat:Beef geo:origin="Switzerland"/>
          <spi:Salt with-iodine="yes"/>
          <spi:Pepper geo:origin="Sri Lanka"/>
      </ff:Patty>
  </ff:Bun>  
</ff:Burger>
```



----

## Extra material

## Conversions from a relational database

Imagine that we have a relational table which stores chat messages. Translate this table into XML and JSON. Consult the lecture slides 6-18.

conversation_id | people | sender | content | timestamp | is_read | attachment_id
--|--|--|--|--
42|charlie,ari,jesse|charlie|hey, here's the doc ><|1510410193|TRUE|NULL
42|charlie,ari,jesse|charlie|NULL|1510410244|TRUE|doc_6492
42|charlie,ari,jesse|ari|thanks!|1510432987|FALSE|NULL
17|rudy,sage|rudy|look at this cute cat!|1500897189|TRUE|img_91847
17|rudy,sage|NULL|aww|1506610190|TRUE|NULL