Incubator Report

acbecker edited this page Dec 15, 2014 · 18 revisions

Key personnel

  • Project lead: Andrew Becker, UW Astronomy
  • eScience Liaison: Daniel Halperin, UW eScience Institute
  • With assistance from: Andrew Whitaker, Bill Howe

Project Overview

Kernel-Based Moving Object Detection (KBMOD) describes a new technique to discover faint moving objects in time-series imaging data. The essence of the technique is to filter each image with its own point-spread-function (PSF), and normalize by the image noise, yielding a likelihood image where the value of each pixel represents the likelihood that there is an underlying point source. We wish to search for objects that have low S/N in a single image (e.g. pixel value between 1-3), but when the signal is aggregated from the multiple images in which the objects appear, have a cumulative S/N that is significant enough to claim a detection (e.g. greater than 10). We consider the process of running a detection kernel along putative moving object trajectories, and summing the likelihood values when this trajectory intersects a science image, the core functionality of KBMOD.

The first step in this process, implemented during the Fall 2014 eScience Data Incubator project, involved examining a database-based solution for the data access and query implementation. PostgreSQL was chosen as the database implementation, primarily because of the PostGIS spatial extension. This allows for native spherical geometry objects and queries. Since the package was originally designed to represent Earth-based geographic information, one minor detail is to make sure that geometric objects are represented on an ideal sphere (I believe this is the correct one: http://spatialreference.org/ref/epsg/3786/) instead of Earth's ellipsoid.

We started the project running a PostsreSQL database on an Amazon Relational Database Service (RDS) instance, but ran into the limitations that you could not log (e.g. ssh) into the machine to copy data locally for ingest, or install C-language User Defined Functions (UDFs). The latter requirement was due to the desire to replicate the WCSLIB mapping of sky coordinates to image pixels, which comes from metadata contained in the image headers, in the database. This necessitated a need to install the database on an Elastic Compute Cloud (EC2) instance where we had complete sysadmin control of the system.

The bulk of the work during this incubator was in designing database tables and then queries on these tables, for the purposes of intersecting space-time trajectories of moving objects with our imaging dataset. In shorthand, we wanted to find out which image a moving object intersected with, at which sky coordinate inside the image (in the 2-D sky plane defined by the Right Ascension and Declination coordinate system), and finally which x,y pixel this corresponds to. Three table versions were implemented, which can be reduced to a maximal and minimal table design, described below.


Maximal Table Design

Run Set Image Pixel
Primary Key runId setId imageId pixelId
Foreign Key --- runId setId imageId
Spatial Geom bbox bbox bbox3d intIdx
Temporal Geom tmin,tmax tmin,tmax bbox3d ---
Number Entries 3E+02 5E+03 2E+06 5E+12

In this relatively complete description, we replicate the hierarchy of the imaging data in database tables. Each Run (e.g. night of observation) is made up of several Sets of data. For the SDSS survey, which we are using as a reference data set to help design KBMOD, a Set would consist of a unique [camcol,filter]-defined contiguous strip of pixel data; for LSST a Set would correspond to an Exposure. A Set is made up of several Images, implemented as FITS files, each of which is an array of Pixels. This maximal table design captures these inter-relationships through foreign keys linking each lower element in the hierarchy to its parent.

We encapsulate the temporal width of each Run, Set, and Image by a set of two TIMESTAMP WITH TIME ZONE values corresponding to the start and end time of each object. For SDSS, this temporal window for Images is 53.9 seconds, and for Sets within a Run are essentially similar, and correspond to the nights-worth of observations.

We encapsulate the spatial extent of each Run, Set, and Image using PostGIS Polygon bounding boxes. We address the x,y coordinate of a Pixel in an Image by a single integer (intIdx) representing its location in the flattened array. Finally, a 3-dimensional bounding box is built for each Image (vs. 2-D for Run and Image), where the first two dimensions represent its spatial extent and the third its temporal extent.

We found the ingestion of the Pixels data to be the largest bottleneck, with serial ingest of all the data approximated to take on the order ~2 years. Since this operation is redundant (the pixels are already stored in an ordered object, the native FITS file) we considered removing the operation of accessing the pixel value from the database side of things. We instead consider the possibility of accessing pixel-value information by running many trajectories at once, grouping by the image they intersect, and performing a single FITS open-read-close operation to access the relevant information. In this paradigm the Pixels table is unnecessary, which leads to our minimal design below.


Minimal Table Design

Run Image
Primary Key runId imageId
Foreign Key --- setId
Spatial Geom bbox bbox3d
Temporal Geom tmin,tmax bbox3d
Number Entries 3E+02 2E+06

Using this design, we perform the following query to intersect a Trajectory (defined by a starting point and time, and a spatial velocity: x0,y0,t0,x',y') with a Run, to cull on the largest scale which Run-Trajectory combinations have intersecting elements:

  • The Trajectory position is evaluated at the starting and ending times of each Run, and a 3-D line is generated which represents the spatial-temporal motion (or extent) of the object during the Run.

  • The resulting 3-D lines are then intersected with the Image bbox3d using the overlap operator &&&.

  • The ra,decl position of a Trajectory within an Image is determined by evaluating its position at the mid-time of the Image.

  • [Optionally] The UDF is used to map the sky position to Image x,y using the WCS transformation. This step is optional since it may proceed outside the database, where we would have more control over the number of times a WCS object is instantiated to evaluate the coordinate mappings.


Two important query metrics for this project are "how many trajectories can we concurrently intersect with our dataset" and "how long does this query take per trajectory"? Our initial table design was queried using 1 hard-coded Trajectory, which took nearly 8 seconds. We used several optimizations aside from table design to speed up these queries, including covering indexes and clustered indexes. By the end of quarter, we ran a batch of 10 queries intersecting 5000 trajectories with the entirety of the Image data, to yield an average over batches of 34 seconds, or 7 milli-seconds per Trajectory.

Query timing for KBMOD

Figure 1. KBMOD query timing plot. We plot, as a function of the week of the eScience incubator project, the amount of time per-Trajectory to intersect the Trajectory and Image tables. The size of each point is proportional to the number of entries in the Trajectory table during the query.

Future Work

Next steps include probing the lower bound on this query time, as well as investigating how to optimally access the pixel data.

Project Links

  • Final presentation

  • Final SQL Query (Note the TI table is technically unnecessary but allows for use of raInt,decInt by the UDF in the final select):


WITH TI AS (
    WITH TR AS (
        SELECT
        traj.trajId as trajId, 
        traj.ra0 as ra0,
        traj.dec0 as dec0,
        traj.t0 as t0,
        traj.delta_ra as delta_ra,
        traj.delta_dec as delta_dec,
        ST_MakeLine(ST_MakePoint(traj.ra0  + EXTRACT(EPOCH FROM run.tmin-traj.t0) * traj.delta_ra, 
                                 traj.dec0 + EXTRACT(EPOCH FROM run.tmin-traj.t0) * traj.delta_dec,
                                 EXTRACT(EPOCH FROM run.tmin)), 
                    ST_MakePoint(traj.ra0  + EXTRACT(EPOCH FROM run.tmax-traj.t0) * traj.delta_ra, 
                                 traj.dec0 + EXTRACT(EPOCH FROM run.tmax-traj.t0) * traj.delta_dec,
                                 EXTRACT(EPOCH FROM run.tmax))) as tline
        FROM 
            Trajectory as traj,
            Run as run
        WHERE
            run.bbox && 
            ST_MakeLine(ST_MakePoint(traj.ra0  + EXTRACT(EPOCH FROM run.tmin-traj.t0) * traj.delta_ra, 
                                     traj.dec0 + EXTRACT(EPOCH FROM run.tmin-traj.t0) * traj.delta_dec),
                        ST_MakePoint(traj.ra0  + EXTRACT(EPOCH FROM run.tmax-traj.t0) * traj.delta_ra, 
                                     traj.dec0 + EXTRACT(EPOCH FROM run.tmax-traj.t0) * traj.delta_dec))
        AND
            ABS(EXTRACT(EPOCH FROM run.tmax-traj.t0)) < 2592000
    )
    SELECT 
        im.imageId, TR.trajId,
        (TR.ra0  + EXTRACT(EPOCH FROM im.tmid-TR.t0) * TR.delta_ra) as raInt, 
        (TR.dec0 + EXTRACT(EPOCH FROM im.tmid-TR.t0) * TR.delta_dec) as decInt
    FROM
        Image as im,
        TR
    WHERE
        im.bbox3d &&& TR.tline
)
SELECT 
    TI.imageId, TI.trajId, TI.raInt, TI.decInt
FROM
    TI
;
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.