New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support for errest/residual of individual field variables. #116

Merged

FreddieWitherden merged 1 commit into PyFR:develop from nloppi:resid

Feb 23, 2017

Member

nloppi commented Jan 24, 2017

No description provided.

Member Author

nloppi commented Jan 24, 2017

Hi Freddie,

I opened the pull request early just see if you are happy with the structure. I am still polishing the code, checking formatting and renaming inconsistencies. Phi backend still needs to be tested as well.
Thanks!

nloppi force-pushed the resid branch 3 times, most recently from 5ef5f73 to fa807c5 Compare

January 24, 2017 14:39

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/blasext.py Outdated

-                                    .format(npdtype_to_ctype(x.dtype))
+                      # Determine the blocksize
+                      block = (128, 1, 1)
+                      # Determine the gridsize

Contributor

FreddieWitherden Jan 24, 2017

Add newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/blasext.py Outdated

+                      # Build the reduction kernel
+                      rkern = self._build_kernel(
+                          'errest', src, [np.int32]*3 + [np.intp]*4 + [dtype]*2)

Contributor

FreddieWitherden Jan 24, 2017

Move ) onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/blasext.py Outdated

+                      block = (128, 1, 1)
+                      # Determine the gridsize
+                      grid = (int((ncolb + block[0] - 1)/block[0]), 1, 1)
+                      # Empty result buffer on host, shape = (nvars, nblocks)

Contributor

FreddieWitherden Jan 24, 2017

Add newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/blasext.py Outdated

+                      grid = (int((ncolb + block[0] - 1)/block[0]), 1, 1)
+                      # Empty result buffer on host, shape = (nvars, nblocks)
+                      err_host = cuda.pagelocked_empty((ncola, grid[0]), dtype, 'C')
+                      # Device memory allocation

Contributor

FreddieWitherden Jan 24, 2017

Add newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                      }
+                      // Last warp
+                      if (tid < 32){

Contributor

FreddieWitherden Jan 24, 2017

Move { onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                  }
+                  // Unrolled reduction within blocks except the last block
+                  if (blockIdx.x != gridDim.x - 1){

Contributor

FreddieWitherden Jan 24, 2017

Move { onto linewline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                  // Unrolled reduction within blocks except the last block
+                  if (blockIdx.x != gridDim.x - 1){
+                      if (tid < 64){

Contributor

FreddieWitherden Jan 24, 2017

Move { onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                  }
+                  // Copy to global memory
+                  if (tid == 0){

Contributor

FreddieWitherden Jan 24, 2017

Move { onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                  }
+                  // Last block reduced with a variable sized loop
+                  else{

Contributor

FreddieWitherden Jan 24, 2017

Move { onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                      % endfor
+                      }
+                  }

Contributor

FreddieWitherden Jan 24, 2017

Remove newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                  // Last block reduced with a variable sized loop
+                  else{
+                      for(int s = 1; s < lastblksize; s *= 2){

Contributor

FreddieWitherden Jan 24, 2017

Move { onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                  // Last block reduced with a variable sized loop
+                  else{
+                      for(int s = 1; s < lastblksize; s *= 2){
+                          if (tid % (2*s) == 0) {

Contributor

FreddieWitherden Jan 24, 2017

Move { onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/kernels/errest.mako Outdated

+                      // Last warp
+                      if (tid < 32){
+                      % for n in [32, 16, 8, 4, 2, 1]:

Contributor

FreddieWitherden Jan 24, 2017

pyfr.ndrange

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

+                      nrow, ldim, dtype = x.traits
+                      ncola, ncolb = x.ioshape[1:]
+                      # Determine the "blocksize"

Contributor

FreddieWitherden Jan 24, 2017

Use OpenCL terminology.

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/dual/controllers.py Outdated

+                      # L2 norm
+                      if self._norm == 'l2':
+                          # Reduce locally (element types) and globally (MPI ranks)

Contributor

FreddieWitherden Jan 24, 2017

Remove newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/dual/controllers.py Outdated

+                          # Normalise
+                          res = np.sqrt(rg/self._gndofs)
+                      # Linfinity norm

Contributor

FreddieWitherden Jan 24, 2017

L^

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/dual/controllers.py Outdated


		# Normalise
		res = np.sqrt(rg/self._gndofs)

Contributor

FreddieWitherden Jan 24, 2017

Remove newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/dual/controllers.py Outdated

+                          rg = [comm.allreduce(r, op=get_mpi('sum')) for r in rl]
+                          # Normalise
+                          res = np.sqrt(rg/self._gndofs)

Contributor

FreddieWitherden Jan 24, 2017

Space around /

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/dual/controllers.py Outdated

+                          # Reduce locally (element types) and globally (MPI ranks)
+                          rl = [np.sum(e) for e in zip(*errest.retval)]
+                          rg = [comm.allreduce(r, op=get_mpi('sum')) for r in rl]

Contributor

FreddieWitherden Jan 24, 2017

MPI can do vector -> vector reductions (see force calculation plugin for an example).

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/std/controllers.py Outdated

@@ @@ -1,6 +1,7 @@ @@
               # -*- coding: utf-8 -*-
               import math
+              import numpy as np

Contributor

FreddieWitherden Jan 24, 2017

Add newline. (NumPy is not a standard Python package and so goes into its own stanza.)

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

+                              self.queue = queue
+                              rkern(queue.cl_queue_comp, (ncolb, 1, 1), block,
+                                    nrow, ncolb, ldim, err_dev, x.data, y.data,
+                                    z.data, atol, rtol)

Contributor

FreddieWitherden Jan 24, 2017

Move ) onto newline.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

-                              zarr = Array(qcomp, cnt, dtype, data=z.data)
+                              # Reduction across blocks outsourced to numpy
+                              if norm == 'uniform':
+                                  return np.max(err_host, axis=1)

Contributor

FreddieWitherden Jan 24, 2017

Outside the class define something line

reducer = np.max if norm == 'uniform' else np.sum

then just do return reducer(err_host, axis=1).

Contributor

FreddieWitherden commented Jan 24, 2017

Biggest things: Blocksize is currently always 128. This could be a problem in relation to running out of shared memory (think MHD where we have 9 field variables). Adapt the code to support any power of two blocksize. The result should be one for loop with an if statement to decide if we need to sync threads or not.

Secondly, the OpenCL code is probably wrong in the sense that it, unlike NVIDIA, does not have warps and does not guarantee that groups of 32 threads execute synchronously. As such you always need the barrier.

Member Author

nloppi commented Jan 24, 2017

Thanks for the comments, Freddie!

Yes, actually I always found a difference in 5th or 6th digit between OpenCL (my desktop) and CUDA (nighthawk) and was planning to double check it on the test machine that has both backends. It must be related to this synchronisation feature.

nloppi force-pushed the resid branch from fa807c5 to 57eb1fc Compare

January 27, 2017 15:55

Member Author

nloppi commented Jan 27, 2017

Now the OpenCL and CUDA outputs match as well. I still need to test Phi

nloppi force-pushed the resid branch 2 times, most recently from 86d9814 to da8f2ec Compare

February 20, 2017 15:18

FreddieWitherden reviewed

View reviewed changes

pyfr/plugins/base.py Outdated

@@ @@ -63,6 +64,12 @@ def __init__(self, intg, cfgsect, suffix=None): @@
                           raise RuntimeError('System {0} not supported by plugin {1}'
                                              .format(intg.system.name, self.name))
+                      # Check that we support this particular integrator formulation
+                      if not ('*' in self.formulations

Contributor

FreddieWitherden Feb 20, 2017

Overkill to support *. Just check for intg.formulation in self.formulations and require plugins to specify both std and dual.

nloppi force-pushed the resid branch from da8f2ec to dbca691 Compare

February 20, 2017 21:38

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/base/makoutil.py Outdated

@@ @@ -14,6 +15,11 @@ def ndrange(context, *args): @@
                   return util.ndrange(*args)
+              def ilog2range(context, x):
+                  n = x.bit_length() - 1
+                  return [int(x/(2**(i + 1))) for i in range(n)]

Contributor

FreddieWitherden Feb 21, 2017

In [8]: x = 256

In [9]: [2**i for i in range(x.bit_length())]
Out[9]: [1, 2, 4, 8, 16, 32, 64, 128, 256]

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/cuda/blasext.py Outdated

-                      # Norm type
-                      reduce_expr = 'a + b' if norm == 'l2' else 'max(a, b)'
+                      # Determine the reduction block dimensions

Contributor

FreddieWitherden Feb 21, 2017

Misleading comment.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/mic/kernels/errest.mako Outdated

+                          {
+                          % for i in range(ncola):
+                              idx = r*ldim + X_IDX_AOSOA(${i}, ${ncola});
+                              % if norm == 'uniform':

Contributor

FreddieWitherden Feb 21, 2017

Identation.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

+                      block = (128, 1, 1)
+                      # Determine the number of groups
+                      grid = ((ncolb + block[0] - 1)//block[0], 1, 1)

Contributor

FreddieWitherden Feb 21, 2017

Space around //

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

-                              self._retarr = rkern(xarr, yarr, zarr, atol, rtol,
-                                                   queue=qcomp)
+                              self.queue = queue

Contributor

FreddieWitherden Feb 21, 2017

?

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/openmp/kernels/errest.mako Outdated

+                          {
+                          % for i in range(ncola):
+                              idx = r*ldim + X_IDX_AOSOA(${i}, ${ncola});
+                              % if norm == 'uniform':

Contributor

FreddieWitherden Feb 21, 2017

Indentation.

FreddieWitherden reviewed

View reviewed changes

pyfr/plugins/base.py Outdated

@@ @@ -63,6 +64,11 @@ def __init__(self, intg, cfgsect, suffix=None): @@
                           raise RuntimeError('System {0} not supported by plugin {1}'
                                              .format(intg.system.name, self.name))
+                      # Check that we support this particular integrator formulation
+                      if not (intg.formulation in self.formulations):

Contributor

FreddieWitherden Feb 21, 2017

not in.

nloppi force-pushed the resid branch from dbca691 to ffdca04 Compare

February 21, 2017 13:09

Contributor

FreddieWitherden commented Feb 21, 2017 •

edited

Loading

In terms of the stats plugin I would change the code to firstly keep track of the total (global) number of pseudo iterations. Say, self.npseudoiter. Then I would do:

if monitoring:
    <stuff>
    self.pseudostats.append((self.npseudoiter, i, ...))
else:
    self.pseudostats.append((self.npseudoiter, i, None))

self.npseudoiter += 1

this way any plugin always knows the number of iterations taken and there is no issue with rejected steps (i resets, but that's fine, as npseudoiter is always increasing no matter what);

nloppi force-pushed the resid branch from ffdca04 to 092f31b Compare

February 21, 2017 20:51

Member Author

nloppi commented Feb 21, 2017

I did as you suggested. Now it prints (None,)*nvars if the residuals are not computed.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/base/makoutil.py Outdated

@@ @@ -2,6 +2,7 @@ @@
               from collections import Iterable
               import itertools as it
+              import math

Contributor

FreddieWitherden Feb 22, 2017

Remove import.

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/base/makoutil.py Outdated

@@ @@ -14,6 +15,10 @@ def ndrange(context, *args): @@
                   return util.ndrange(*args)
+              def ilog2range(context, x):
+                  return [2**i for i in range(x.bit_length())][-2::-1]

Contributor

FreddieWitherden Feb 22, 2017

Rework range: range(x.bit_length() - 2, -1, -1).

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

+                      block = (128, 1, 1)
+                      # Determine the number of groups
+                      grid = ((ncolb + block[0] - 1) // block[0], 1, 1)

Contributor

FreddieWitherden Feb 22, 2017 •

edited

Loading

Expression is different to that used by the CUDA backend, check equivalence. We also do not use this variable properly.

Member Author

nloppi Feb 22, 2017

Changed so that it is passed to the kernel

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

-                      # Norm type
-                      reduce_expr = 'a + b' if norm == 'l2' else 'max(a, b)'
+                      # Reduction workgroup dimensions
+                      block = (128, 1, 1)

Contributor

FreddieWitherden Feb 22, 2017

I do not believe that PyOpenCL requires the additional dimensions if they're 1.

Member Author

nloppi Feb 22, 2017

File "/home/niki/python3venv/venv/lib/python3.5/site-packages/pyopencl-2016.1-py3.5-linux-x86_64.egg/pyopencl/cffi_cl.py", line 1192, in enqueue_nd_range_kernel
work_dim = len(global_work_size)
TypeError: object of type 'int' has no len()

Contributor

FreddieWitherden Feb 23, 2017

Try (128,).

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/dual/controllers.py Outdated

@@ @@ -15,7 +15,9 @@ def __init__(self, *args, **kwargs): @@
                       self._fnsteps = self.cfg.getint('soln-filter', 'nsteps', '0')
                       # Stats on the most recent step
-                      self.stepinfo = []
+                      self.pseudostepinfo = []
+                      # Pseudo-step counter

Contributor

FreddieWitherden Feb 22, 2017

Add newline.

Contributor

FreddieWitherden Feb 22, 2017

Also dump into stats file (see collect_stats method).

Member Author

nloppi Feb 22, 2017

Do you mean dumping the global iteration number self.npseudosteps into stats?

FreddieWitherden reviewed

View reviewed changes

pyfr/integrators/dual/controllers.py Outdated

    
                      self._pseudo_aresid = self.cfg.getfloat(sect, 'pseudo-aresid')

                      self._pseudo_rresid = self.cfg.getfloat(sect, 'pseudo-rresid')

                      self._pseudo_residtol= self.cfg.getfloat(sect, 'pseudo-resid-tol')

                      self._norm = self.cfg.get(sect, 'pseudo-resid-norm', 'uniform')

Contributor

FreddieWitherden Feb 22, 2017

Prefix with pseudo.

FreddieWitherden reviewed

View reviewed changes

pyfr/plugins/pseudostats.py Outdated

+                      # The root rank needs to open the output file
+                      if rank == root:
+                          self.outf = init_csv(self.cfg, cfgsect,
+                                               'total_niter,t,pseudo-iter,' + fvars)

Contributor

FreddieWitherden Feb 22, 2017

Inconsistent column names.

nloppi force-pushed the resid branch 2 times, most recently from 99d4850 to 03804bb Compare

February 22, 2017 22:05

Member Author

nloppi commented Feb 22, 2017

Now the npseudosteps is dumped into the stats file.

Because the writer plugin is called for t=0.0 in the init of BaseIntegrator, I need to initialise npseudosteps = 0 there. I can also do it in DualController/BaseDualIntegrator by switching the order

def init(self, *args, **kwargs):
self.npseudosteps = 0
super().init(*args, **kwargs)

Which one do you prefer?

FreddieWitherden reviewed

View reviewed changes

pyfr/backends/opencl/blasext.py Outdated

-                              self._retarr = rkern(xarr, yarr, zarr, atol, rtol,
-                                                   queue=qcomp)
+                              rkern(queue.cl_queue_comp, (ncolb, 1, 1), block,

Contributor

FreddieWitherden Feb 23, 2017

Might be able to get away with (ncolb,)

Contributor

FreddieWitherden commented Feb 23, 2017

Putting it in the init function before we call super will be fine.


          Support for errest/residual of individual field variables.

a04bb2e

nloppi force-pushed the resid branch from 03804bb to a04bb2e Compare

February 23, 2017 02:00

FreddieWitherden merged commit 17847e2 into PyFR:develop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet