-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for errest/residual of individual field variables. #116
Conversation
Hi Freddie, I opened the pull request early just see if you are happy with the structure. I am still polishing the code, checking formatting and renaming inconsistencies. Phi backend still needs to be tested as well. |
5ef5f73
to
fa807c5
Compare
pyfr/backends/cuda/blasext.py
Outdated
.format(npdtype_to_ctype(x.dtype)) | ||
# Determine the blocksize | ||
block = (128, 1, 1) | ||
# Determine the gridsize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newline.
pyfr/backends/cuda/blasext.py
Outdated
|
||
# Build the reduction kernel | ||
rkern = self._build_kernel( | ||
'errest', src, [np.int32]*3 + [np.intp]*4 + [dtype]*2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move )
onto newline.
pyfr/backends/cuda/blasext.py
Outdated
block = (128, 1, 1) | ||
# Determine the gridsize | ||
grid = (int((ncolb + block[0] - 1)/block[0]), 1, 1) | ||
# Empty result buffer on host, shape = (nvars, nblocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newline.
pyfr/backends/cuda/blasext.py
Outdated
grid = (int((ncolb + block[0] - 1)/block[0]), 1, 1) | ||
# Empty result buffer on host, shape = (nvars, nblocks) | ||
err_host = cuda.pagelocked_empty((ncola, grid[0]), dtype, 'C') | ||
# Device memory allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newline.
} | ||
|
||
// Last warp | ||
if (tid < 32){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move { onto newline.
} | ||
|
||
// Unrolled reduction within blocks except the last block | ||
if (blockIdx.x != gridDim.x - 1){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move { onto linewline.
|
||
// Unrolled reduction within blocks except the last block | ||
if (blockIdx.x != gridDim.x - 1){ | ||
if (tid < 64){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move { onto newline.
} | ||
|
||
// Copy to global memory | ||
if (tid == 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move { onto newline.
} | ||
|
||
// Last block reduced with a variable sized loop | ||
else{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move { onto newline.
% endfor | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove newline.
|
||
// Last block reduced with a variable sized loop | ||
else{ | ||
for(int s = 1; s < lastblksize; s *= 2){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move { onto newline.
// Last block reduced with a variable sized loop | ||
else{ | ||
for(int s = 1; s < lastblksize; s *= 2){ | ||
if (tid % (2*s) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move { onto newline.
|
||
// Last warp | ||
if (tid < 32){ | ||
% for n in [32, 16, 8, 4, 2, 1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyfr.ndrange
pyfr/backends/opencl/blasext.py
Outdated
nrow, ldim, dtype = x.traits | ||
ncola, ncolb = x.ioshape[1:] | ||
|
||
# Determine the "blocksize" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use OpenCL terminology.
pyfr/integrators/dual/controllers.py
Outdated
# L2 norm | ||
if self._norm == 'l2': | ||
# Reduce locally (element types) and globally (MPI ranks) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove newline.
pyfr/integrators/dual/controllers.py
Outdated
# Normalise | ||
res = np.sqrt(rg/self._gndofs) | ||
|
||
# Linfinity norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L^
pyfr/integrators/dual/controllers.py
Outdated
|
||
# Normalise | ||
res = np.sqrt(rg/self._gndofs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove newline.
pyfr/integrators/dual/controllers.py
Outdated
rg = [comm.allreduce(r, op=get_mpi('sum')) for r in rl] | ||
|
||
# Normalise | ||
res = np.sqrt(rg/self._gndofs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space around /
pyfr/integrators/dual/controllers.py
Outdated
# Reduce locally (element types) and globally (MPI ranks) | ||
|
||
rl = [np.sum(e) for e in zip(*errest.retval)] | ||
rg = [comm.allreduce(r, op=get_mpi('sum')) for r in rl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPI can do vector -> vector reductions (see force calculation plugin for an example).
pyfr/integrators/std/controllers.py
Outdated
@@ -1,6 +1,7 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
import math | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newline. (NumPy is not a standard Python package and so goes into its own stanza.)
pyfr/backends/opencl/blasext.py
Outdated
self.queue = queue | ||
rkern(queue.cl_queue_comp, (ncolb, 1, 1), block, | ||
nrow, ncolb, ldim, err_dev, x.data, y.data, | ||
z.data, atol, rtol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move ) onto newline.
pyfr/backends/opencl/blasext.py
Outdated
zarr = Array(qcomp, cnt, dtype, data=z.data) | ||
# Reduction across blocks outsourced to numpy | ||
if norm == 'uniform': | ||
return np.max(err_host, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside the class define something line
reducer = np.max if norm == 'uniform' else np.sum
then just do return reducer(err_host, axis=1).
Biggest things: Blocksize is currently always 128. This could be a problem in relation to running out of shared memory (think MHD where we have 9 field variables). Adapt the code to support any power of two blocksize. The result should be one for loop with an if statement to decide if we need to sync threads or not. Secondly, the OpenCL code is probably wrong in the sense that it, unlike NVIDIA, does not have warps and does not guarantee that groups of 32 threads execute synchronously. As such you always need the barrier. |
Thanks for the comments, Freddie! Yes, actually I always found a difference in 5th or 6th digit between OpenCL (my desktop) and CUDA (nighthawk) and was planning to double check it on the test machine that has both backends. It must be related to this synchronisation feature. |
Now the OpenCL and CUDA outputs match as well. I still need to test Phi |
86d9814
to
da8f2ec
Compare
pyfr/plugins/base.py
Outdated
@@ -63,6 +64,12 @@ def __init__(self, intg, cfgsect, suffix=None): | |||
raise RuntimeError('System {0} not supported by plugin {1}' | |||
.format(intg.system.name, self.name)) | |||
|
|||
# Check that we support this particular integrator formulation | |||
if not ('*' in self.formulations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overkill to support *. Just check for intg.formulation in self.formulations and require plugins to specify both std and dual.
pyfr/backends/base/makoutil.py
Outdated
@@ -14,6 +15,11 @@ def ndrange(context, *args): | |||
return util.ndrange(*args) | |||
|
|||
|
|||
def ilog2range(context, x): | |||
n = x.bit_length() - 1 | |||
return [int(x/(2**(i + 1))) for i in range(n)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [8]: x = 256
In [9]: [2**i for i in range(x.bit_length())]
Out[9]: [1, 2, 4, 8, 16, 32, 64, 128, 256]
pyfr/backends/cuda/blasext.py
Outdated
|
||
# Norm type | ||
reduce_expr = 'a + b' if norm == 'l2' else 'max(a, b)' | ||
# Determine the reduction block dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading comment.
{ | ||
% for i in range(ncola): | ||
idx = r*ldim + X_IDX_AOSOA(${i}, ${ncola}); | ||
% if norm == 'uniform': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identation.
pyfr/backends/opencl/blasext.py
Outdated
block = (128, 1, 1) | ||
|
||
# Determine the number of groups | ||
grid = ((ncolb + block[0] - 1)//block[0], 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space around //
pyfr/backends/opencl/blasext.py
Outdated
|
||
self._retarr = rkern(xarr, yarr, zarr, atol, rtol, | ||
queue=qcomp) | ||
self.queue = queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
{ | ||
% for i in range(ncola): | ||
idx = r*ldim + X_IDX_AOSOA(${i}, ${ncola}); | ||
% if norm == 'uniform': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
pyfr/plugins/base.py
Outdated
@@ -63,6 +64,11 @@ def __init__(self, intg, cfgsect, suffix=None): | |||
raise RuntimeError('System {0} not supported by plugin {1}' | |||
.format(intg.system.name, self.name)) | |||
|
|||
# Check that we support this particular integrator formulation | |||
if not (intg.formulation in self.formulations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in.
In terms of the stats plugin I would change the code to firstly keep track of the total (global) number of pseudo iterations. Say, self.npseudoiter. Then I would do:
this way any plugin always knows the number of iterations taken and there is no issue with rejected steps (i resets, but that's fine, as npseudoiter is always increasing no matter what); |
I did as you suggested. Now it prints (None,)*nvars if the residuals are not computed. |
pyfr/backends/base/makoutil.py
Outdated
@@ -2,6 +2,7 @@ | |||
|
|||
from collections import Iterable | |||
import itertools as it | |||
import math |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove import.
pyfr/backends/base/makoutil.py
Outdated
@@ -14,6 +15,10 @@ def ndrange(context, *args): | |||
return util.ndrange(*args) | |||
|
|||
|
|||
def ilog2range(context, x): | |||
return [2**i for i in range(x.bit_length())][-2::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rework range: range(x.bit_length() - 2, -1, -1)
.
pyfr/backends/opencl/blasext.py
Outdated
block = (128, 1, 1) | ||
|
||
# Determine the number of groups | ||
grid = ((ncolb + block[0] - 1) // block[0], 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expression is different to that used by the CUDA backend, check equivalence. We also do not use this variable properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed so that it is passed to the kernel
pyfr/backends/opencl/blasext.py
Outdated
# Norm type | ||
reduce_expr = 'a + b' if norm == 'l2' else 'max(a, b)' | ||
# Reduction workgroup dimensions | ||
block = (128, 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not believe that PyOpenCL requires the additional dimensions if they're 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File "/home/niki/python3venv/venv/lib/python3.5/site-packages/pyopencl-2016.1-py3.5-linux-x86_64.egg/pyopencl/cffi_cl.py", line 1192, in enqueue_nd_range_kernel
work_dim = len(global_work_size)
TypeError: object of type 'int' has no len()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try (128,).
pyfr/integrators/dual/controllers.py
Outdated
@@ -15,7 +15,9 @@ def __init__(self, *args, **kwargs): | |||
self._fnsteps = self.cfg.getint('soln-filter', 'nsteps', '0') | |||
|
|||
# Stats on the most recent step | |||
self.stepinfo = [] | |||
self.pseudostepinfo = [] | |||
# Pseudo-step counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also dump into stats file (see collect_stats method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean dumping the global iteration number self.npseudosteps into stats?
pyfr/integrators/dual/controllers.py
Outdated
self._pseudo_aresid = self.cfg.getfloat(sect, 'pseudo-aresid') | ||
self._pseudo_rresid = self.cfg.getfloat(sect, 'pseudo-rresid') | ||
self._pseudo_residtol= self.cfg.getfloat(sect, 'pseudo-resid-tol') | ||
self._norm = self.cfg.get(sect, 'pseudo-resid-norm', 'uniform') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefix with pseudo.
pyfr/plugins/pseudostats.py
Outdated
# The root rank needs to open the output file | ||
if rank == root: | ||
self.outf = init_csv(self.cfg, cfgsect, | ||
'total_niter,t,pseudo-iter,' + fvars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent column names.
99d4850
to
03804bb
Compare
Now the npseudosteps is dumped into the stats file. Because the writer plugin is called for t=0.0 in the init of BaseIntegrator, I need to initialise npseudosteps = 0 there. I can also do it in DualController/BaseDualIntegrator by switching the order def init(self, *args, **kwargs): Which one do you prefer? |
pyfr/backends/opencl/blasext.py
Outdated
|
||
self._retarr = rkern(xarr, yarr, zarr, atol, rtol, | ||
queue=qcomp) | ||
rkern(queue.cl_queue_comp, (ncolb, 1, 1), block, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be able to get away with (ncolb,)
Putting it in the init function before we call super will be fine. |
No description provided.