# Sensor-Prediction
***
This project is about detecting the sensor readings in real-time. The data set selected for the project is the sulphur hexaflouride gas readings after dissemination in to the wind. The following is the description about the dataset

# Dataset

## Introduction
***
- In the Joint Urban 2003 field study, SF6 tracer gas was released during the 10 Intensive Observation Periods(IOPs). 
- Each IOP included two or three continuous 30-min releases and 3-6 instantaneous puff releases.
- For the field study as a whole there were 29 continuous releases (17 daytime and 12 nocturnal) and 40 puff releases (25 daytime and 15 nocturnal). 
- All of the releases were from the built up downtown urban area.

> As this project deals with the real-time sensor readings prediction, the 29 continuous releases are the area of the interest.

> But the given data has the puff releases.

- Also took real time, fast-response tracer measurements using 10 TGAs mounted in vans, all but one at a fixed location for the duration of any given release period. 
- One TGA was driven back and forth continuously to obtain a cross-section of the plume.
- Additional fast-response sampling was done using Miran samplers in proximity to the release.

> The 10 TGA's tracer measurements are the fast-response, which are equivalent to sensors. Hence, these readings are used to build the model


## 3D Sonic

- Each sub-directory contains meteorological measurement data from one of the project collaborators.

## Fast Response SF6

- This directory contains two sub-directories, LLNL Miran and TGA.
- The TGA measurements were made using elctron capture detection at levels from as low as about 20 ppt to about 25,000 ppt. 
- The TGA directory contains the majority of the SF6 fast-response measurements.
- A separate Readme in the TGA directory describes the content of the files.

> TGA directory has the readings of the tracer, which are used to build the model

## Other directories

### Profile Data

- This directory contains meteorological profile measurements from various sources including ANL, FRD, and PNNL minisodars (wind speed and direction), PNNL RASS (temperature), PNNL radar wind profiler (wind speed and direction), and PNNL rawindsonde. 
- Each measurement has its own sub-directory and corresponding Readme.

###  PWID&Hobo&PNNLmet

- This directory contains wind speed, wind direction, temperature, RH, pressure, and solar radiation measurements at sites in thefield study area. 
- Separate Readme files are available for each sub-directory.

> These other directories has some factors, that we might might need to account in building the model. Try to include this factors after building a model

## TGA

### TGA analyzer

- The TGA-4000 real-time SF6 analyzer is a fast response instrument (response time less than 1 sec) designed specifically to measure the concentration of SF6 in ambient air.
- The detection limit of the ECD is about 5 parts per trillion volume (pptv) under optimum laboratory conditions.   The maximum concentration is about 10,000 pptv, but can be doubled with the aid of a dilution system. 
- The operator log books and concentration plots were carefully reviewed for any anomalies that required the QC flags to be set.

> These flags can be used to detect the anomalies

### QC Flags

- Some important information about flags is:

    - The operator log books and concentration plots were carefully reviewed for any anomalies that required the QC flags to be set.  
    - The review looked specifically for instrument over range, dilution system usage that was not detected, starting or stopping of the dilution system during a peak and van movements during a peak.  
    - Any other problems were also noted.  From this review, a list of flags that needed to be set was generated and entered into the computer.  
    - These were combined with the data during the generation of final data files so that users would be aware of any questionable data.  
    - The flag values are defined as:

Value | Description
----- | -----------
0 |   good data.
1 |   concentration less than LOQ but greater than LOD; treat as an estimate.  (See note on dilution system below.) 
2 |   concentration less than LOD; not statistically different than 0; treat as 0 or null value.  (See note on dilution system below.)
3 |   concentration is greater than 115% of the highest calibration; treat as an estimate.
4 |   instrument over ranged its output; concentration is unusable.
5 |   null values.  Analyzer was in position and operating correctly and no SF6 was found.  Treating these concentrations as 0 is appropriate.
6 |   analyzer was not in use. No data available.  Do NOT treat these as 0.  Flag 6 indicates a human decision to not operate.  For example: leave and do calibrations, move to a new place, we don't need you this test, etc.
7 |   analyzer was broken.  No data available.  Do NOT treat these as 0 values.  Concentrations are unknown.
8 |   analyzer was operating, but was experiencing problems. Treat all concentrations as estimates.
9 |   concentrations are unusable because of instrument problems, but are included for qualitative indications only.  In this case, the instrument was operating and collected data, but problems discovered later made it impossible to have any confidence at all in the  concentrations.  Since the data was available it was  included and may be useful for some purposes such as  determining arrival times, etc.  Calculations should  not be done with these concentrations.
10 |   concentrations unusable because of external problems. For example: fugitive sources, noise caused by trucks passing, etc.
11 |  concentrations are estimates because of external problems. This flag indicates that something external to the analyzer had a small effect on the data, making it less certain but not totally unreliable.  For example: a passing truck  creating a small amount of noise during a high concentration peak.

For more explaination read the README file in TGA folder

### Location of the analyzer

- For every SF6 peak marked by the operator at a given location, the median latitude and longitude from the "good" GPS positions in the file was determined.  
- "Good" GPS positions were defined to be all those with horizontal dilution of precision (hdop) less than or equal to 3.0 and the number of satellites in use greater than or equal to 4.

> If HDOP is less than 3.0 and number of satellites is greater than 4, then GPS position is said to be good


- The operator notebooks were then carefully reviewed to determine which peaks were measured at the same location.  Once a group of peaks was identified with a particular location, the median values of latitude and longitude were plotted on a map.  
- If the medians appeared to make a tight group at the appropriate location, it was assumed that the GPS worked reasonably well and the reported position was the average of these medians.  
- If the medians exhibited significant scatter (more than a few car lengths), appeared to be in the wrong location or there were too few to determine if the grouping was good, it was assumed that the GPS did not work well in that location and the GPS locations were not used.  

> Depending on the median values of the latitudes and longitudes, we can decide if we can use the location values or not

- In these cases, the positions were read off of high resolution satellite photos available from Terraserver.com.  - These photos had resolutions of about 6 inches per pixel and readily showed sidewalks, crosswalks, parking spaces, vehicles on the roads, etc.  
- The positions read off of Terraserver.com did not include altitudes, so the altitude was reported as -999 in the data files while those taken from GPS positions have an altitude reported in meters.  

> If the altitude is -999, then values are taken from the terraserver.com as the gps readings are not reliable and the sensor is at street level.

- If the altitude is -999, it should be assumed that the analyzer van was at street level.  (All vans were at street level except when analyzer 7 was parked on the top of the Main Street Parking Garage.  A GPS position was reported in this case.)

### Movement of the Analyzer

- When the analyzer vans were mobile, the GPS positions were reported in the data files.  Some caution is advised when using these since they will contain some spikes and erroneous readings.  There are also a few instances where the GPS lost its position and took several minutes to regain it.  

- These are most easily detected by looking at the HDOP (horizontal dilution of precision) and the number of satellites the GPS used.  Both of these values are included in the data files.  HDOP decreases as the position value improves.  

- Typically, reliable readings will have HDOP values of 4 or less.  Higher values are questionable and anything with an HDOP over 10 is generally very bad. 

- The position values also improve as the number of satellites increases.  At least 4 satellites are required for a good GPS position and more are better.  

- Any position with less than 4 satellites should be regarded as unreliable.  These rules of thumb apply only to the mobile vans.  

> All the above mentioned values are for moving vehicles

- Stationary vans that have averaged GPS positions or positions read off of Terraserver.com have HDOP=0 and number of satellites= -1 in the data files.

> For stationary vehicles the hdop is 0 and satellites = -1

- Van 5 was always mobile.  Other vans were inadvertently mobile for one peak measurement on three occasions: van 3 in IOP 4, van 2 in IOP 5, and van 8 in IOP 5.

- These occurred when the analyzer unexpectedly measured significant SF6 while driving to or leaving a stationary position.

### Review of final data files.

- After the final data files were created , they were carefully reviewed for any problems.  Each of the 390 data files were read into Excel and each column plotted versus time.  
- The concentrations were compared to the earlier peak plots to verify that all the peaks were included at the correct time.
- The position variables (longitude, latitude, altitude, hdop, number of satellites) were plotted and reviewed to verify that van movements were accurately reflected in the data files.  
- Longitudes and latitudes were checked to verify that the correct ones were being included.  
- The QC flags were checked visually by plotting and by computer programs that listed start and stop times for each flag and the range of concentrations flagged with a 1 or 2.  
- These lists were then compared with the lists generated earlier in the QC process.  

## FILE FORMATS

- There is one file for each analyzer (or van) for each release.  The entire puff release period is counted as one release and is included in a single file.  
- There were typically 4 release periods per IOP.  Each file contains 1.5 hours of data sampled at 2 Hz.  Files are named:


     IxxRyVz.CSV

where     xx is IOP number  (01 to 10)
          y is release number (1 to 4)
          z is analyzer or van number (0 to 9)

> This data is about the concentration of the gas after each puff release 

The columns in each file are:

1. day of year (corresponds to UTC time)
2. UTC hours
3. UTC minutes
4. UTC seconds (as ss.s)
5. IOP number
6. analyzer or van number
7. pass number
8. hours in Central Daylight Time on day of test start (hh.hhhhhhh)
9. latitude (degrees)
10. longitude (degrees)
11. altitude (meters)
12. number of satellites in use by GPS
13. hdop
14. concentration of SF6 (in parts per trillion by volume)
15. QC flag

NOTES:

1. Pass number counts the number of times the analyzer has made a pass
through the plume or where the plume was expected to be.  It is only
meaningful for mobile analyzers.  Stationary analyzers have this column
set to -1.

2. Hours CDT gives times as hours (hh.hhhhhhh) on the day the IOP
began.  If the IOP extended across midnight, this column will increase past
24.  It does NOT reset to 0.  This provides a continuously increasing time
value that is convenient for some processing.

3. Double precision variables should be used when processing the hours,
latitude, and longitude fields.  Single precision numbers may not provide
expected resolution.