Skip to content

Latest commit

 

History

History
443 lines (425 loc) · 30.8 KB

cs310.md

File metadata and controls

443 lines (425 loc) · 30.8 KB

CS310

GNU General Public License v3.0 licensed. Source available on github.com/zifeo/EPFL.

Fall 2015: Introduction to computer vision

[TOC]

Introduction

  • what is an image
  • how do we create an image
  • how do we perceive an image
  • what is the relationship between the physical world and the digital values of an image
  • how do we process an image
    • to create a visually pleasing image
      • to extract information from the image
  • times and images
    • camera obscura (Aristoteles ~350 AC to Leonardo Da Vinci ~1500) : 2D light projection in a black room
    • magic lantern (Kircher 1631) : slide project ancester
    • camera lucida (Wollaston 1806) : prism enabling drawing and light project at the same time
    • light sensitive materials (Schulze 1727) : chalk + silver nitrate = silver chloride darken by sunlight
    • fixer (Herschel 1819) : determination of light sensitivities (absorption spectra) of silvers halides (binary chemical component with negative ion) and resolve unfix with sodium thiosulfate
    • first photograph (Niépce 1822) : zinc plate with light sensitive asphalt, non-exposed areas washed with turpentine (call heliograph)
    • daguerreotype (Dagueerre 1831) : first permanent photo processes (bought and free for usage by the Académie Française 1839)
    • calotype (Talbot 1835) : negative-positive process
    • 1st photo exhibit (St. Gallen 1840)
    • wet collodion process (Archer 1851) : replace daguerrotype by wet and more sensitive plate
    • gelatin dry plates (Maddox 1871) : even more sensitive
    • Kodak funded (Eastman 1880) : apaper roll film, cellulose nitrate
    • scanner (Kirsch 1954)
    • digital photography (Minsky 1966) : connected camera to TV
      • first digital camera (Sasson 1975) : resolution of 100x100- camera phone (2000)
    • social networks sharing

Image formation

  • color signal : $C_i(\lambda)=E(\lambda)S_i(\lambda)$
  • sensor response : $\rho=\int_\lambda E(\lambda)S(\lambda)R(\lambda)d\lambda$
  • spectral power distribution : radiant power per wavelenght describe light source
  • relative spectral power distribution : normalized maximum to 1 or to match 555nm
  • color signal at spatial position $x$ : $C(x,\lambda)\propto E(x, \lambda)S(x, \lambda)$ (surface characteristics and viewing geometry)
  • discrete model : samples taken every 10nm from 400 to 700nm $\rho_k(x)=\sum_{i=1}^{31} S(x, \lambda_i)E(x, \lambda_i)R_k(\lambda_i)$
  • discrete vector model : $\rho_k(x)=c(x)^Tr_k=s(x)^T\text{diag}(e(x))r_k$ and $P_{(n\times 3)}=S_{(n\times m)}^TD(E){(m\times m)}R{(m\times 3)}$
  • trichromatic imaging systems : red, blue, green $p(x)=[p_1(x),p_2(x),p_3(x)]$

Light and photometry

  • wave theory (Huygens)
    • fundamental : $\lambda=VT$ where $\lambda$ is wavelength, $V$ velocity and $T$ temporal period
    • in vacuum : $\lambda=cT$ where $c=2.998e^8m/s$ is speed of light
    • temporal frequency : $\nu=c/\lambda Hz$
    • spatial frequency : $f=1/\lambda$
  • wave as a function of space : $\gamma=a\sin(\frac{2\pi x}{\lambda})=a\sin(\phi)$
  • harmonic wave (time) : $\gamma=a\sin(\omega t)=a\sin(\frac{2\pi t}{T})=a\cos(\omega t - \pi/2)$
    • angular temporal frquency (angular speed): $\omega\equiv 2\pi/T=2\pi\nu$
    • monochromatic : single constant frequency and single wavelength (hard in pratice nonetheless very close and known as quasimonochromatic)
    • can be superposed
  • law of reflection : angle of incidence = angle of reflection (to the normal)
  • specular reflection : when obey to the law of reflection
  • diffuse reflection : when the reflection seems to be uniform (e.g. on powder)
  • spread reflection : mix of specular and diffuse reflection
  • refraction : light bending on medium change (slow down when entering medium denser) $n\sin(\theta)=n'\sin(\theta')$
    • refractive index : $n=\frac{c}{V}$
    • for wavelength 598nm : vacuum $n=1$, air $n=1.0029$ and glass $n=1.52$
  • diffraction : light enlarge as they pass by the edge of narrow aperture $\theta = \lambda/D$ where $D$ is aperture diameter
  • quantum theory (Planck) : electromagnetic wave = photo
    • energy released as electron drops towards the nucleus
    • only certain vibrational frequencies are allowed
    • everything is discrete
      • packets of electromagnetic energy = photos
      • number of photons = intensity of radiation
      • color (size) of photons = wavelength of radiation (frequency)
    • every object emits energy from its surface when above 0° Kelvin
  • energy of emmited radiation : $Q=h\nu=hc/\lambda$ in joules where $h=6.623e^{-34}$ (Planck's constant)
  • radiant power (flux) : $P(\lambda)$ radiant energy emmited per time unit in joules/seconde
  • quanta (number of photons) : $N_\nu=\frac{P_\lambda\lambda^3}{hc^2}=\frac{P_\lambda\lambda}{h\nu^2}$ per time unit
  • black body : object that can absorb and emit electromangetic radiation in all parts of the spectrum
  • spectral radiant exitance : $M_{e\lambda}=c_1\lambda^{-5}(e^{c_2/T\lambda}-1)^{-1} Wm^{-3}$ at temperature $T$ per unit wavelenght where $c_1=3.742e^{-16} Wm^2$ and $c_2=1.4388e^{-2} mK$
  • Wien's displacement law : $\lambda_{\text{max}}=\frac{2.897e^{-3}}{T}$ is the wavelength of peak emission
  • Stefan-Boltzmann's law : as temperate increase, amout of radiation at each wavelength increase $M_e=\int^\infty_0 M_{e\lambda}d\lambda=\sigma T^4 W/m^2$ where $\sigma=5.67e^{-8} W/m^2K^{-4}$
  • correlated color temperature : light color emitted of solid object can be completely specified by radiation laws thus a color temperature can be assigned to any light by visual matching
  • Commision international de l'Eclairage CIE standardization : A (2856K), C (6774K), D50 (5003K), D55 (5503K), D65 (6504K), D75 (7504K)
  • light sources
    • incadescent (continuous spectra): tungsten lamps, halogen lamps, LED (does not cover the whole spectrum)
    • luminsecent (discontinuous spectra) : electrical discharge lamps, fluorescent lamps, electronic flash
  • radiometry : measure all electromagnetic radiation equally
  • photometry (light) : measure electromagnetic radiation between 380 and 750nm and weights it according to the luminosity $\nu_\lambda$ function of the human eye
    • radiant flux : flow of photons per unit of time in Watt (joules/second)
    • lumen (lm) : photometric equivalent of the watt weighted to match eye response (1 watt at 555nm = 683 lumens)
    • luminous flux : $\Phi_L=683\int_{380}^{750}\nu(\lambda)\Phi_E(\lambda)d\lambda$ in lumens where $\Phi_E(\lambda)$ is the radiant flux per wavelength (Watts/nm) and $\nu(\lambda)$ the luminosity function
    • terminology
    • steradian : solid angle $\omega=A/r^2$ where $A$ is surface area (sphere contains $4\pi$ steradian)
    • intensity : 1 candela is a flux of 1 lumen per steradian in all directions
    • inverse square law : $E=\frac{I}{d^2}$ in lux where $I$ is the intensity in cd
    • luminance : $L=\frac{I}{A}$ in $cd/m^2$ with $A$ being the area
    • lambertian surface : reflects light in all directions (opposite of mirror) $L=\frac{ES}{\pi}$
    • Lambert's cosine law : $E_\theta=E\cos(\theta)$ with incident angle $\theta$
  • ultraviolet light : UV-A allow fluorescence (used as tanning lamps), UV-B destructive as enough energy to damage but not yet stop by the atmoshpere, UV-C is absorbed by the atmosphere within 100 meters
  • infared light : does have the required amount of energy to pass the detection threshold
  • effective irradiance : weighted irradiance in proportion to biological/chimical effect that light has on substance

Human visual system (HVS)

  • goal of digital photography : imitate processing of HVS
  • cornea : outer transparent structure that covers the iris, pupil and anterior chamber with refractive index 1.38 (shape variation responsible for near- and far- sightedness)
  • lens : transparent double convex controllbed by ciliary muscles (focusing = lens get fatter increasing refraction index, less flexible and yellowish with age)
  • iris : colored ring of muscule behind the cornea in front of lens regulates amount of light by adjusting the size of the pupil (eye color determined by concentration and distribution of melanin)
  • pupil : adjustable opening (circa 2-8mm)
  • retina : contains visual systems photosensitive cells
  • optical nerve : ~1 million fiber that carry information
  • blindspot : small area of retina where optic nerve enters the eye (no sensor there)
  • fovea : central part of retina with best spatial and color vision (~2° of visual angle) protected by macula (a yellow filter)
  • rods
    • low lumance vision also called scotopic (< 1cd/m^2)
    • 120 million per retina
    • primarily located outside fovea
    • one photopigment (rhodopsin) with peak spectral responsivity at ~510nm
  • cones
    • high luminance level also called photopic (> 100cd/m^2)
    • ~7 million per retina
    • primarily located inside fovea
    • three type of cones (small, medium and large)
  • mesotopic : in between scotopic (dark) and photopic (light)
  • visual signal processing
    • ganglion cell : receptive field
    • spatial contrast : positive ganglion input of one area surround by negative input (vice versa)
    • color contrast : positive ganglion input of one color surround by negative input of another color (vice versa)
  • opponent color theory : colors defined by R-G ($L-M+S$) and Y-B ($L+M-S$) as spatial details are transmitted as luminance signal ($L+M+S$)
  • dark and light adaptation : adjustment to lighting is not immediate (retain only relative luminance values)
  • tone mapping : maps scene radiances to display luminances (low dynamic range LDR to high dynamic range HDR)
  • trichromatic color theory : three receptors, three signals, ratio of three signals give color appearance
  • color imaging
    • additive representations : used for capture, processing tasks, additive display encodings (RGB, LMS, XYZ)
    • subtractive representations : used for print (CMYK, cyan, magenta, yellow, black)
    • opponent representations : used for compression, ordering systems, visual difference metrics (YCrCb, YUV, Lab)
  • lateral inhibition : light on one part of retina inhibits adjacent regions (machbands effect, on luminance narrow band boundaries edge are more distinct due to inhibition)
  • sharpening : increase local constrast
  • uniform field constrast : HVS response to local lumance variation depends on surrounding luminance
  • Weber-Fechner law : contrast $C_W=\Delta L/L$ (implies non-linear intensity percieved $I\propto\log(S)$ where $S$ is stimulus size)
  • threshold contrast : minimum contrast to detect change in intensity is nearly constant
  • gamma-characteristics : compensate HVS brightness non-linearity perception (taken into account in A/D conversion and image quality evaluations, knowning that +10/-10 is considered as equally perceived brightness)
  • contrast sensitivity functions : describe spatial and temporal eye properties (usually measured with periodic patterns at varying frequencies)
  • Michelson contrast : $C_M=\frac{L_{\text{max}}-L_{\text{min}}}{L_{\text{max}}+L_{\text{min}}}$ where $L$ is lumance of the pattern
  • visibility threshold : $VT$ is contrast when stimulus becomes visible
  • contrast sensitivity : $CS=1/VT$ (HVS low sensitivity used in compression, watermarking, encodings)

Fourier and convolution

  • fourier series : $f(x)=a_0+\sum_{n=1}^\infty a_n\cos(nux)+b_n\sin(nux)$
  • Euler's formula : $e^{inux}=\cos(nux)+i\sin(nux)$
  • complex fouier series : $f(x)=\sum_{n=-\infty}^\infty c_ne^{inux}$ where $a_n=c_n$ and $b_n=ic_n$ with $c_n=\frac{1}{T}\int_T f(x)e^{in2\pi/T x}dx$
  • fourier transform : $F(u)=\int_{-\infty}^\infty f(x)e^{-i2\pi ux}dx$ represents amout of frequency present in $f(x)$
  • gaussian function : $f(x)=e^{-1/2x^2\sigma^{-2}}$ and its tranform $F(u)=\sqrt{2\pi}\sigma e^{-1/2u^2\sigma^2}$ (fourier transform inverse variance)
  • 2D fourier transform : $F(u,v)=\sum_x\sum_y f(x,y)e^{-i2\pi ux}e^{-i2\pi v y}$
  • convolution : $h(x)=\int_{-\infty}^\infty f(\zeta)g(x-\zeta)d\zeta$
  • convolution in spatial domain is multiplication in frequency domain
  • point spread function (PSF) : image of a point of intensity is the PSF, images are formed by summing overlapping PSFs (small spread has high sharpness and resolution)
  • line spread function

Linear filtering

  • pixel (point) operations : brightness change (gamma-correction), contrast change, image difference, white balancing, colorspace conversions, histogram (reshape)
  • neighboring operations : smoothing (denoising), sharpening, edge detection
    • low-pass filter : integration (averaging) for approximations $g=\frac{1}{9}I_{9\times 9}$
    • high-pass filter : differentiation for images details $g=\begin{pmatrix}-1 & -1 & -1\\ -1 & 8 & -1\\ -1 & -1 & -1\end{pmatrix}$ (Laplacian is $4$ with $4\times -1$ arround)
    • unsharpening : subtracting a low-pass = adding a high-pass, improve edge but increase noise
    • noise (high frequency components) : come from acquisition, quantization, compression, transmission and can be reduce by low-pass (gaussian, sliding average), median filtering (best for salt and pepper) or averaging over many images
    • edge dection : image gradient $||\nabla||=\sqrt{(\frac{\partial f}{\partial x})^2+(\frac{\partial f}{\partial y})^2}$
    • sobel operation : $S=\begin{pmatrix}1 & 0 & -1\\ 2 & 0 & -2\\ 1 & 0 & -1\end{pmatrix}$
    • borders (convolution) : visible artefacts
      • zero padding : simple
      • periodic extension
      • mirror extension
    • correlation : $\phi(m,n)=f(-m,-n)*h(m,n)$ (convolution without flip)
    • fast-fourier transform (2D) : $O(n^2\log_2 n)$ FFT vs $O(n^4)$ classical
  • geometric operations

In camera processing

  • workflow
    1. RAW : color filter array data
    2. first-processing: linearization, dark current (current through photosensitive when no photon ~5-10 electrons) subtraction, flare (optical light parasite) removal and channel balanced
    3. color demosaicking : produce populated RGB channel
    4. color space conversion : input scene-referred image data
    5. rendering to vitual display : output-referred image data (sRGB)
  • RAW data : device-specific, can achieve best image processing if camera characterization data is maintained
  • photosite : collect entering stimuli, varies between 1.5-8 microns in digital camera
  • pixel pitch : distance between adjacent photosites, determines the sampling frequency (smaller give higher)
  • photosensitive area : active area determines sensitivity, sensor-different (full-frame CCD ~80%, interline CCD ~30%, CMOS ~30% over electronic circuitry)
  • types of sensors (digital cameras)
    • charge coupled device CCD : convert photon into charges that are collected on each photosite
    • complementary metal oxide semiconductor CMOS : each photosite has its own converter
    • photosensitive material (silicon)
    • arrangement : photon cells in rows or areas
  • sensor response : charge capacity proportional to incident illuminance (still linear at 80% until curvy staturation)
  • dynamic range : ratio of charge capcity (electrons) to dark current (read noise) can be expressed in db
  • demosaicking : with color filter array (CFA)
    • mono-CCD : one color per spatial position
    • bilinear interpolation : can cause blur effect and false color
    • hue based interpolation : using natural correlation between color channels $R=G+k_R$ and $B=G+k_B$
      • bilinear interpolation first G
      • compute ratio R/G and B/G (use log space to transform division into substraction)
      • bilinear interpolation of R/G and B/G
      • multiply R/G and B/G by G
    • template matching : changing interpolation following edges direction (if over threshold)
    • gradient based algorithms : bilinear weighted by an edge function sampling : if the sampling frequency is double the maximum frequency, then the function can be recovered $F_{\text{sampling}}&gt;2 F_{\text{max(signal)}}$ (for 2D images $1/\Delta i &gt; 2F_{i\text{max}}$ where $i$ is $x$ or $y$ and $\Delta i$ is the distance between two adjacent pixels)
    • Bayer CFA sampling is not the same for red, blue and green
    • can be represented by row of dirac impulse
    • aliasing : for avoiding it, sampling interval $\delta x$ require $\frac{1}{2\delta x}&gt;u_c$ where $u_c$ is maximum frequency$
    • Nyquist frequency : maximum frequency that can be recovered (higher one will be aliased) $u_N=\frac{1}{2\delta x}$
    • moiré : strip artefact occuring due to aliashing (sub-sampling)
    • anti-aliasing filter : optical blur, diffraction
  • fourier transform of CFA

Colorimetry

  • white balancing : process of removing color casts due to illuminant so that objects that are white remain white
  • LMS color space : represent three types of cones named after their sensitivity (long, medium, short wavelengths)
  • color matching : any visible color can be matched as an additive combination of three color primaries $C^t(\lambda)=RC^r(\lambda)+GC^g(\lambda)+BC^b(\lambda)$
    • symmetric
    • transitive
    • proportional
    • additive
    • experiment : CIE define standard colorimetric observer (2° 1931, 10° 1964)
    • CIE XYZ : standard color matching function (sensitivity function) used for sRGB, cone fundamentals calculations, color measurements and color difference encoding
  • opponent color : four fundamental colors (red, yellow, green, blue) and combinations red-blue is purple, red-yellow is orange but no red-green nor blue-yellow
    • oppencency works on cone constrast not on quantum catches of cones
    • spectrally opponent processes : color chromiance (red vs green, yellow vs blue)
    • spectrally non-opponent process : lightness (black vs white)
    • opponent encoding is most efficient encoding of LMS cone responses from information theoretical point of view
  • YCbCr : Y is luma, Cb is blue-yellow, Cr is red-green
  • XYZ from SPD : $X=K\int_\lambda S(\lambda)E(\lambda)x(\lambda)d\lambda$ (reciprocally with $y$ and $z$) where $K=100/\int_\lambda E(\lambda)y(\lambda)d\lambda$
    • perfect diffusor : $y=100$
    • perfect reflector : $S(\lambda) = 1$
    • Y tristimulus value : represents luminance of a color (absolute lumiance in cd/m^2 if not normalized)
    • most calculations use Y = 100 as whitest point
    • white paper is assumed to be a perfect reflector under illuminant D50
    • white monitor is assumed to be a perfect diffusor under illuminant D50
    • X, Z do not correspond to any perceptual attributes
  • chromacity diagram : projecting color value to unity vector (e.g. $r=R/(R+G+B)$)
    • no distance metric is defined
    • CIE XYZ 1931
    • CIE 1976 Lightness L*
    • CIE LUV : lighting and display
    • CIE LAB : reflecting stimuli (opponent color space derived through psychophysical experiments)
  • color difference formula : $\Delta E_{ab}^=\sqrt{(\Delta L^)^2+(\Delta a^)^2+(\Delta b^)^2}$ where $E_{ab}^*\sim 1$ is consider a good approximation of just noticeable difference (JND)
  • metamerism : 2 colors appear the same with different spectral reflectance factors under one illuminant but different under another
  • adjusted difference metrics : account problematic areas for creating an uniform color space
  • chromatic adaptation : ability of HVS to adapt to color of the illuminant and preserve at best appearance of an object (color matching not applicable)
    • sensory mechanism : independent sensitivity control at photo receptors and neurons of first visual processing stages
    • cognitive mechanism : all object we already know are regarded as familiar aspect through color memory
  • Von Kries model : linear transform from XYZ to relative cone responses RGB under test illuminant
  • no cone space : no real consensus, except that the cone space need to be sharp (less correlated than LMS cone responses), most camera sensitivites are considered as sharp
  • color constancy : find either the illumiant or reflectances from RGB data (white balacing)
    • lightness algorithms : grey-world, maximum RGB
    • linear model algorithms
    • neural network algorithms
    • probabilistic algorithms : color by correlation
    • based on real world surfaces (using specularities)

Colorspace

  • colorspace : geometric representations of colors in space usually of three dimensions
    • color appearance : CIE AM97, CIE AM02
    • device dependent : no direct relationship with CIE colorimetry (spectral characteristics, color component function, white-point need to be specified)
  • color component transfer : non-linear to mimic HVS $C'=\alpha C^\gamma$ where $C$ is color channel and $\gamma$ range from $1/1.8$ to $1/2.4$
  • color space plus digital encoding
    • digital code value range : $m$-bit associated to color space range $n$-bit
    • values outside min and max are clipped to bounds
    • same color space can result in different color encodings
    • image state : color rendering of the image data
      • input-referred : scene or original, image data represents an estimate of colorimetry
      • output-referred : image data represents colorimetry of original data color rendered to a real output device
  • HSV : hue, saturation, value (based on untuitive color characteristics as tint, shade and tone)
  • YIQ : Y = luma or brightness, I = in-phase, Q = quadrature (last two are chrominance componenent), linear used in analog NTSC television (YUV in PAL television)
  • YcbCr : families of non-linear/gamma encoding, transform ensure an approximately oppenent (luma, chrominance)
  • color order system : organize color perception
    • Munsell tree (hue, value, chroma)
    • natural color system
    • uniform color system
    • Pantone : primarily used in printing
  • from RAW to sensor-referred : transform from camera response to tri-stimulus values $s=f(p)$ where $p=$RGB and $s=$XYZ
    • simple method : assume camera response is linear $S=PM$
    • in reality : camera do not fulfil Luther condition $S\sim PM$
    • error minimization : $M=f(S, P)\text{ s.t. }min||PM-S||$ via least squares error (pseudoinverse $M=P+S$)

Tone mapping

  • tone mapping : mapping between the input image dynamic range to the output medium dynamic range
    • LDR scene : expansion
    • HDR scene : compression
    • global : spatially uniform, fast, loss of contrast, not sufficient for HDR
    • local : size varying, mimic human local adaptation, slow, artefacts
  • contrast ratio : second name of dynamic range, ratio between brightest and darkest object in scene (e.g. HDF display 1:25'000, natural world 1:100'000'000)
  • intent : reproduction goal, observer preferences, colorimetric accuraxy, perceptual accuracy
  • preferred reproduction : saturation enhancement, sharpness increase, noise removal, detail visibility improvement
  • black and white point correction : global, map white point of image to brightest display luminance by using min of low-passed luminance image (same for dark)
  • Holm's photographic tone reproduction : global, S-shaped
    • based on technique called Zone System
    • scene statistics : variables b, c
    • output media range : variable a
    • viewing condition : variable a
  • Ward histogram adjustment : global, redistribution of gray levels to obtain flat histogram
  • Retinex : local, compute new pixel based on spatial relationships of others (path, iterative, surround)
    • edges generate large changes in sensation
    • use spatial comparaison between pixels
    • no longer compared to external reference
    • 4 operations based : ratio of radiances, product for propagating ratio comparisons atimes image, reset discount uniform illuminant, (weighted) average reset product with old value at pixel
  • analogy with retinal processing
  • adaptive non-linearity : $y=\frac{x}{x+x0}$ where $x$ is existant radiance and $x0$ the adaptation factor
  • luminance estimation through horizontal cells feedback (OPL)
  • gamut mapping : colorimetric and perceptual accuracy (need to take into account viewing conditions, absolute scene colorimetric is rendered)

Encoding and file format

  • image communication : limited bandwidth, broad distribution, need for standard, error resilience
  • coding : representation of information with symbols
  • source coding : compression, reduction of redundancy data rate
  • channel coding : adapting data to transmission channel
  • information theory : $I=-\log_2P(a_k)$
  • entropy : uncertainty, average information, theorical limit for lossless compression $H=-\sum_kP(a_k)log_2P(a_k)$
  • source coding methods
    • variable length entropy coding : Huffman coding, arithmetic coding (JPEG)
    • inter-symbol redundancies : Lempel-Ziv coding, run-length coding, differential/predicitive coding
    • additional redundancies : transform coding (perceptual), color space onversion (color), motion prediction (temporal)
  • Lempel-Ziv coding : dictionary based (id,c), add concatenation of string+c to dictionary with new id, decoder recreates dictionary from data, computed on 1 run
    • LZ77 : pointer to previous occurrence and length, followed by huffman coding, flate encoding (zip)
    • LZ78, LZW : dictionary of previous occurences
  • lossy compression : quality evaluated by a rate-distortion curve
  • transform coding : energy compaction/decorrelation, projection onto new basis, frequency separation, fast
  • discrete consine transform (DCT) : symmetric basis functions, quasi-periodic signal repetition, no discontinuities
  • JPEG compression : simple, low computation and memory, parallel processing possible, local spatial adaptation, low quality of lossy
  • JPEG 2000 : improved compression efficiency, multiple resolution representation, tiling, error resilience, rate control, discrete wavelet transform (DWT), high flexibility, large functionnality, blurriness, ringing, complex
  • wavelets
  • progressive coding : flexibility in compression bit stream organisation, rearrangement of bitstream, each layer increase image quality (signal-to-noise ratio) scalability, packets
  • region of interest (ROI) coding : non-uniform quality distribution, higher quality, higher overall compression ratio
    • static ROI : defined at encoding time, storage, remote sensing
    • dynamic ROI : defined interactively on need and transmission, telemedicine, mobile communications
  • file format : metadata (color space, little endian, sample per pixel, resolution), (un)compressed data
    • GIF : graphic interchange format, 256 color palette, LZW (horizontal), animations
    • PNG : 24-bit RGB, grayscale, loseless, alpha channel, gamma correction, color management
    • JFIF : JPEG file interchange format, grayscale, RGB, CMYK, YcbrCr, gamma = 1.0, progressive JPEG
    • EXIF : JFIF with camera metadata
    • TIFF : multispectral images, RAW encodings, 1-64 bits/pixel/channel, packbits, LZW, JPEG encapsulation
  • color palettes : original colors by indices

Optics lenses

  • pinhole camera model : idealized model that defines perpective projection, small hole and star of lines, hole acts as selector, $d\approx \frac{\sqrt{f}}{28}$ where $d$ is hole diameter and $f$ sensor distance, too dark
  • lenses : $P=1/f=(n_d-1)(1/r_1-1/r_2)$ where $P$ is optical power (dioptre 1/m), $f$ focal length, $n_d$ refractive index, $r_i$ radii of surface curvatures
    • convex : convergent
    • concave : divergent
    • photographic lenses : one or more elements, focal length, angle of view, aperture (field depth), resolving power
    • normal perspective : angle of view as human, ~50°, given 24x36mm sensor size (full frame) focal length ~50mm for normal lens
    • compressed perspective : telephoto lens, smaller angle of view
    • extended perspective : wide angle lens, larger angle of view
  • prism : dispersion, in general for visible light $1&lt;n(\lambda_r)&lt;n(\lambda_g)&lt;n(\lambda_b)$
  • aberration : perfect lens do not exist
    • chromatic abberration : lens focusing different wavelength onto different focal planes, sensitive to aperture size
    • abstigmatim : similar to chromatic, off-axis image of a point appearing as a line or ellispe instead of a point
    • spherical aberrations : focal length of lens vary depending on how far a point is from center of lens, need to close aperture or combine convergent and divergent lens
    • coma : similar to spherical, rays entering lens at angle, focal point of lens will vary the further away the ray hits the lens from the center, burring the image
    • distortion
    • curvature of field
  • vignetting : lens does not fit the sensor
  • aperture : closing it reduce the number of rays that are imaged, reducing depth
  • circle of confusion : depending on aperture, only object points in the object plane will be imaged as points, before and behind will be imaged as circles
    • maximum permissible circle : largest circle of blur on a senser that will still be perceived by the HVS as a clean point when printed at 30cm and viewed at 50cm
  • depth of field
  • resolving power : smallest optical spot a lens can produce, dependent on diffraction
  • Raleigh criterion : minimum resolvable detail, imaging process is said to be diffraction-limited when first diffraction minimum of the image of one source point coincides with the maximum of another
    • angular limit resolution : $\Delta\theta =\frac{1.22\lambda}{d}$
    • spatial limit of resolution : $\Delta l_{min}=\frac{1.22 f\lambda}{d}=1.22N\lambda$
  • modulation transfer function (MTF) : transfer function of a system showing how well the original contrast has been captured
    • modulation : difference in response $M=\frac{M_\text{image}}{M_{\text{stimulus}}}$