# Mastering TaQL

*Of: Heb geen hekel aan TaQL*

TaQL is part of [casacore](https://github.com/casacore/casacore), a set of libraries for radio astronomy data processing.

TaQL stands for "Table Query Language", but it can be used also without tables.

## Using this notebook

This notebook is written in Jupyter, but it contains a binding to the TaQL kernel. If you highlight a code cell, you can press **Shift-Enter** to evaluate it in TaQL. If you press **Alt-Enter**, the cell will evaluate and a new cell is inserted below it.

You can evaluate all the TaQL commands already present in this notebook. To understand the commands, you can try to predict the outcome of a statement before evaluating it. Also, you are encouraged to change the commands: you can enter any valid TaQL-statement in this notebook.

Navigation in Jupyter notebooks can be tricky. If you are in *command mode* (not editing a cell), lots of keyboard shortcuts are active, like `"j"` and `"k"` for scrolling, `"a"` for inserting a cell, or `"dd"` for deleting one (these should be familiar if you know `vi`). If you want to type in a cell, make sure to be in *edit mode* by checking you see a blinking cursor. 

  * To go to edit mode, press Enter or double click a cell.
  * To go back to command mode, press Esc or single click another cell.

<div class="alert alert-success">**Exercise**: what will happen if you type "`add`" in command mode, and try it to see if you're right.</div>

## TaQL as a calculator

Although it's not the main intended use, you can use TaQL like a regular calculator:

In [None]:
6*7

Exponentiation is done using `**` (as in Python). The operator `^` is a bitwise *xor* – if you don't know what that is, just don't use `^`.

In [None]:
5^3

In [None]:
5**3

TaQL can do complex numbers as well, where the complex unit can be given as `i` or `j`:

In [None]:
(3+4i)/(8-1j)

Most functions you expect to work are actually there:

In [None]:
sin(pi()/2)

<div class="alert alert-success">**Exercise**: try some functions and see if they work. Does TaQL respect operator precedence?.</div>

### Arrays and indexing

Unlike some other languages, TaQL supports lists (or actually arrays):

In [None]:
[10,34,21,0,-3.4,8]

A lot of functions exist specifically for lists:

In [None]:
mean([10,34,21,0,-3.4,8])

In [None]:
stddev([10,34,21,0,-3.4,8])

<div class="alert alert-success">**Exercise**: use the function `sumsqr` (sum of squares) to compute the length of the vector *(3, 4)*.</div>

Arrays can be created using the python way of creating **ranges**. `0:5` creates a range `0,1,2,3,4` (note that the end is *exclusive*). You can use a third part to specify the step size: `0:16:5` creates the range `0,5,10,15`. To make an array from a range, enclose it in square brackets.

<div class="alert alert-success">**Exercise**: create the array `[13, 17, 21, … 45, 49]` and compute its average.</div>

Multiple ranges can be combined:

In [None]:
[1:5, 10, 15:25:3]

**Indexing** arrays works also works with ranges. The start defaults to 0, the end defaults to the end of the array. For example, `a[4:13:3]` takes the elements with index 4, 7 and 8. Again, note that the end is *exclusive*.

In [None]:
[10,34,21,0,-3.4,8][::2]

Higher dimensional arrays are supported:

In [None]:
[[37,  1,  3],[34,  5,  7]]

It is also possible to specify a higher dimensional array by giving the data and the shape separately:

In [None]:
array([37,1,3,34,5,7], 2, 3)

In [None]:
means([[37,  1,  3],[34,  5,  7]], 2)

As you see, this yields an array of two rows and three columns.

**Note**: the command line version `taql` prints this as
```
Axis Lengths: [3, 2]  (NB: Matrix in Row/Column order)
[37, 34
 1, 5
 3, 7]
```
In this notebook, it is printed in Column/Row order, just like in python and C.

Indexing higher arrays works with `[…, …]`. Leaving out an indexing expression selects the entire axis.

<div class="alert alert-success">**Exercise**: add an indexing expression in the array below to select the row with values *(37, 1, 3)*. Afterwards, change the indexing expression to select the column with values *(1, 5)*.</div>

In [None]:
array([37,1,3,34,5,7], 2, 3)[ *** ]

To take the mean over a subset of the axes of a higher dimensional array, use `mean`**`s`** (just like in NumPy):

In [None]:
means(array([37,1,3,34,5,7], 2, 3), 0)

In [None]:
means(array([37,1,3,34,5,7], 2, 3), 1)

Most operators and functions act sensibly when you apply them to an array:

In [None]:
3 + [1,2,3]

In [None]:
sin([pi()/2, pi()/4, pi()/3, pi()/6])

### Comparison

In [None]:
sqrt(2)/2 == sin(pi()/4)

Why isn't `sin(pi()/4)` equal to `sqrt(2)/2`? The answer is floating point precision: computers don't know absolute numbers.

TaQL has a function `near` to compare if to numbers are relatively near, with a default tolerance of *10<sup>-13</sup>*.

In [None]:
near(sqrt(2)/2, sin(pi()/4), 1.e-13)

For testing with a relative tolerance of *10<sup>-5</sup>* (useful for single precision numbers), you can use the shorthand operator `~=` (which resembles ≅):

In [None]:
sqrt(2)/2 ~= sin(pi()/4)

### Sets and intervals

To test whether a value is in some set, you can use the keyword `in`, where the set is specified like a one-dimensional array:

In [None]:
4 in [4:10:3]

You can also test whether a number is in a continuous interval. For open intervals (start and end are exclusive), use `<` and `>`, for closed intervals use `{` and `}` (start and end are inclusive).

The TaQL version of *4 ∈ (3,4]* is

In [None]:
4 in <3,4}

The TaQL version of *4 ∈ (3,4)* is

In [None]:
4 in <3,4>

One of the sides of the interval can be omitted to mean it can be infinity. The TaQL version of *4 ∈ (3,∞)* is

In [None]:
4 in <3,>

The left-hand side can be an array as well:

In [None]:
[2:15:3] in <1,5>

## Units

TaQL has basic support for units, even obscure ones.

In [None]:
4m + 3in

SI prefixes like `p`, `n`, `u` (for µ), `m`, `c`, `d`, `da`, `h`, `k`, `M`, `G`, `T` can be used.

<div class="alert alert-success">**Exercise**: Evaluate with taql if a *millifoot* is in the open interval between 100 and 200 *nano-mile* (this is an accidental feature of TaQL).</div>

Unit support is not perfect, reduction of units does not work:

In [None]:
200m/200m

In [None]:
1/1s+1Hz

Some checking of units is performed:

In [None]:
sqrt(3km)

Non-given units are assumed to be the same as the first given unit:

In [None]:
1+2deg

In [None]:
[1,2,3m,4]

### Angles

Angles can be given in `h:m:d` or `d:m` format, or in radians or degrees:

In [None]:
4h56m03.5 + 4d12m43.7 + 1 deg - 0.3 rad

If you want the result in a different unit, append that unit to an expression:

In [None]:
(4h56m03.5 + 4d12m43.7 + 1 deg - 0.3 rad) deg

The unit of an expression will be the same as the unit of the first component:

In [None]:
0deg + 4h56m03.5 + 4d12m43.7 + 1 deg - 0.3 rad

To format an angle in hours, minutes and seconds, use the function `hms()`. Similarly, to format it in degrees, minutes, seconds, use `dms()`. To format an array with RA-DEC values, use `hdms()`, which formats even elements with `hms()` and odd elements with `dms()`.

<div class="alert alert-success">**Exercise**: put the coordinates of Westerbork, *(6.60417°, 52.91692°)* in an array, and format it in the conventional RA-DEC notation.</div>

Functions for calculations with angles are built in, for example for computing the angular distance between two positions:

In [None]:
angdist([6.60417, 52.91692] deg, [0, 90] deg) deg

### Pretty-printing

The function `str` exists to format anything nicely into a string. The optional second argument to this function specifies the formatting. For example, you can specify how many digits should be printed:

In [None]:
str(pi(),20.8)

If you speak C, you can also use `printf`-style formatting:

In [None]:
str(pi(), "%20.8f")

In [None]:
str((19+9j)/7, "%.3f + %.3fi")

### Dates

![ISO 8601 was published on 06/05/88 and most recently amended on 12/01/04.](https://imgs.xkcd.com/comics/iso_8601.png "XKCD 1179")

Literal dates can be entered directly into TaQL, for example using the above ISO standard (which was introduced after the first version of casacore).

In [None]:
1981-04-01

Warning: the leading zeros are essential here:

In [None]:
1981-4-1

Values can be converted to dates with the function `date()` or `datetime()`. Without arguments, this gives the current date (or date + time).

In [None]:
date(0.)

As you can guess from the above, dates are internally stored as modified Julian Date.

To convert a date to a pretty-printed date, you can use `cdate()`:

In [None]:
cdate(date(0.))

Similarly for showing times there is `ctime()`, and for showing both date and time there is `cdatetime()`.

Calculations on dates work like you would expect:

In [None]:
date() - 1981-01-04

<div class="alert alert-success">**Exercise**: when were you 10.000 days old?</div>

### Times

The function `time()` gives the time (current time if no arguments given) in *radians*. This makes it possible to write times in the same way as angles: 

In [None]:
time() > 12h38m

<div class="alert alert-success">**Exercise**: check that `datetime() - date()` (which gives a result in days) is consistent with `time()`.</div>

To convert a time to a string, use the function `ctime()` (remember it as "*see time*), or `cdatetime()` to include the date.

In [None]:
ctime(5000 s)

## Measures

The prefix `meas.` is for functions linking to CasaCore's *measures* library. These functions make it possible to convert measures like directions, epochs, and positions from one reference frame to another.

### Times

To do really accurate computations with times, one should use Measures. When you specify a time, it is interpreted with respect to the `UTC` frame (Coordinated Universal Time). To convert to a different frame, e.g. `TAI` (International Atomic Time), use `meas.epoch`:

In [None]:
cdatetime(meas.epoch("TAI", 2016-01-28 15:00:00, "UTC"))

Since the default time frame is `UTC`, it may be omitted.

As you see, there is a discrepancy between `UTC` and `TAI`. This is due to leap seconds.  These leap seconds are announced only half a year before (for example, here's the [announcement](ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.49) for 2015's leap second). This is one of the reasons that you sometimes get warnings if your casacore data directory is out of date.

In [None]:
meas.epoch("TAI","30-Jun-2015")-meas.epoch("UTC","30-Jun-2015")

In [None]:
[3m]

As you see, a leap second was inserted in `UTC` between June and July 2015. Leap seconds are not applied in the `TAI` standard, otherwise the standards are the same.

<div class="alert alert-success">**Exercise**: calculate the number of seconds between `1997-01-01 00:00 UTC` and `2000-01-01 00:00 UTC`, and explain why the answer is *not* `94608000 s`.</div>

Available time frames are:

"`LAST`" (Local Apparent Sidereal Time), `"LMST"` (Local Mean Sidereal Time), `"GMST1"` (Greenwhich Mean ST1), `"GAST"` (Greenwhich Apparent ST1), `"UT1"`, `"UT2"` (Universal Time), `"UTC"`, `"TAI"`, `"TDT"` (Terrestrial Dynamical Time), `"TCG"` (Geocentric Coordinate Time), `"TDB"` (Barycentric Dynamical Time), "`TCB`" (Barycentric Coordinate Time)

In [None]:
ctime(meas.epoch("LMST", datetime(), "UTC", "WSRT"))

In [None]:
hdms(meas.position("WGSLL", "WSRT"))

### Positions

Positions on Earth must be given with respect to a reference frame. Two important reference frames are `WGS84` and `ITRF`. Positions can be converted between reference frames with the function `meas.position` (or `meas.pos`).

In [None]:
meas.position("ITRF", [6.60417, 52.91692] deg, "WGS")

Since `WGS` is the default, it may be omitted.

The positions of most radio telescopes are predefined:
`"ALMA"`, `"ARECIBO"`, `"ATCA"`, `"BIMA"`, `"CLRO"`, `"DRAO"`, `"DWL"`, `"GB"`, `"GBT"`, `"GMRT"`, `"IRAM PDB"`, `"IRAM_PDB"`, `"JCMT"`, `"MOPRA"`, `"MOST"`, `"NRAO12M"`, `"NRAO_GBT"`, `"PKS"`, `"SAO SMA"`, `"SMA"`, `"VLA"`, `"VLBA"`, `"WSRT"`, `"ATF"`, `"ATA"`, `"CARMA"`, `"ACA"`, `"OSF"`, `"OVRO_MMA"`, `"EVLA"`, `"ASKAP"`, `"APEX"`, `"SMT"`, `"NRO"`, `"ASTE"`, `"LOFAR"`, `"MeerKAT"`, `"KAT-7"`, `"EVN"`, `"LWA1"`, `"PAPER_SA"`, `"PAPER_GB"`, `"e-MERLIN"`, `"MERLIN2"`, `"Effelsberg"`, `"MWA32T"`, `"AMI-LA"`

The output of meas.position defaults to be in meters from the origin. By appending `LL` to the code for the frame, you get it in long/lat.

In [None]:
meas.position("WGSLL", "WSRT") deg

<div class="alert alert-success">**Exercise**: compute the angular distance between ALMA and MeerKAT.</div>

### Directions

Casacore knows a lot of reference frames. Conversions between them are done with `meas.dir`:

In [None]:
meas.dir("GALACTIC", [-6h52m36.7, 34d25m56.1], "J2000")

Since `J2000` is the default, it may be omitted.

Several directions have been predefined, like all the planets, the sun and the moon, and standard sources ("`CasA`", "`CygA`", "`TauA`", "`VirA`", "`HerA`", "`HydA`", "`PerA`").

If you want to convert to a coordinate frame which is tied to the Earth, it is necessary to also specify a time and a position.

In [None]:
meas.dir("AZEL", "Jupiter", datetime(), "WSRT")

The frame of the date and time can be given explicitly (and should be if they are not `UTC` and `WGS84`, respectively):

In [None]:
meas.dir("AZEL", "Jupiter", 2000-01-01 00:00, "TAI", 
        [3826577.110, 461022.900, 5064892.758] m, "ITRF") deg

Supported reference frames are:

There is a special function to see when a source will be visible on a given day:

In [None]:
meas.riseset("SUN", date()+[1:4], [6.60417, 52.91692] deg)

In [None]:
select time()

In [None]:
select TIME from demo.MS limit 1

In [None]:
datetime("4871282513.01 s")

<div class="alert alert-success">**Exercise**: when will Cassiopeia A rise tomorrow?</div>

In [None]:
meas.riseset("SUN", date()+5h, [6.60417, 52.91692] deg)

In [None]:
select from demo.MS where TIME in (select rand() limit 3)

In [None]:
select cdate(d[0]), 
       ctime(d[0]), str(d[1],"TIME") 
       from [
         select meas.riseset('SUN',1jan16+rowid(),'UTC',[5d0m,52d0m]) as d limit 31
         ]

In [None]:
select cdate(d[0]), 
       str(d[0],'TIME'), str(d[1],'TIME') 
       from [
         select meas.riseset('SUN',1jan16+rowid(),'UTC',[5d0m,52d0m]) as d limit 31
         ]

## Tables

In [None]:
select DATA, WEIGHT from ~/projects/tim/tim.MS limit 2

Columns

limit, offset, etc

Storing the output

### Using groupby

## Structure of a Measurement Set

Example with subquery

Example with mscal

## Baseline selection syntax