New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a bug in PETScVector #1797

Merged
merged 15 commits into from Jun 9, 2017

Conversation

Projects
None yet
4 participants
@wenqing
Member

wenqing commented May 5, 2017

The bug is in PETScVector::get, which calls VecGetValues. VecGetValues can only get values on the same processor. To fix the bug, the current implementation introduces a scattered global array to store the values of the vector an array, _entry_array, to store the entries of the vector fetched across partitions.

_entry_array contains
either
the entries of the global vector owned by the current rank if the vector is created with given ghost IDs
or
all entries of the global vector if the vector is created without ghost IDs

_entry_array only gets memory if entries of the vector are going to be fetched. For the vector created with given ghost IDs, a lookup table (by std::map<PetscInt, PetscInt>) is created to map the global indices of the ghost entries to local ones if _entry_array gets memory. By this way, the memory usage is saved.
TODO: Using PetscScalar* PETScVector::getLocalVector() (call VecGhostGetLocalForm) instead of a scattered global array. To do this, local node indices of partitions have to be added to DOF table.
Therefore the local indices of the ghost entries are computed on the fly.

Note: In the current implementation of PDE solving part, all global vectors are now created with given ghost IDs. Therefore in the computation of physical process modelling, _entry_array is local within partition.

@wenqing wenqing requested review from endJunction and TomFischer May 5, 2017

@wenqing wenqing force-pushed the wenqing:petsc_fixing branch 2 times, most recently from 0719aff to 57569e1 May 8, 2017

@wenqing

This comment has been minimized.

Member

wenqing commented May 9, 2017

2d_th_cmp

@wenqing wenqing force-pushed the wenqing:petsc_fixing branch 3 times, most recently from 74b19bb to 16fcd71 May 10, 2017

@wenqing

This comment has been minimized.

Member

wenqing commented May 11, 2017

@TomFischer All tests passed.

@TomFischer

This comment has been minimized.

Member

TomFischer commented May 11, 2017

Although with your changes the results of the linear solver look good now, the non-linear solver does not converge in the first time step. Additionally, the parallel linear solver (PETSc bcgs + diagonal preconditioner) still needs much more iterations than the serial linear solver (LIS or Eigen: bicgst + diagonal preconditioner). I guess there is another mistake in at least in the branch https://github.com/TomFischer/ogs/tree/PartitionCellProperties.

@wenqing

This comment has been minimized.

Member

wenqing commented May 11, 2017

@TomFischer The linear convergence criteria implemented in different solver packages are different. Therefore, the difference in the number of convergence iterations is normal. For the non-linear solver convergence, the error calculation under MPI has to be checked.

The included test is a non-linear problem, which is actually similar to the 3D that you are testing.

@TomFischer

This comment has been minimized.

Member

TomFischer commented May 12, 2017

@wenqing The difference in the number of iterations is huge. The serial algorithm (bicgst + diagonal preconditioner) takes 3 iterations. In contrast the parallel algorithm takes more than 2300 iterations. I think a small difference between serial and parallel algorithms is ok, but this is not a small difference.

@wenqing wenqing force-pushed the wenqing:petsc_fixing branch from 16fcd71 to 144f558 May 22, 2017

@wenqing wenqing force-pushed the wenqing:petsc_fixing branch from 144f558 to 3d52aba May 29, 2017

@wenqing

This comment has been minimized.

Member

wenqing commented May 30, 2017

@TomFischer Use different linear solver tolerance solved the problem.

@wenqing wenqing added please review and removed WIP 🏗 labels May 30, 2017

@wenqing

This comment has been minimized.

Member

wenqing commented May 30, 2017

3D test (20 CPUs) finished with an info at the end of the computation:

info: [time] Time step #694 took 2.26019 s.
...
info: [time] Execution took 1694.31 
@endJunction

This comment has been minimized.

Member

endJunction commented May 30, 2017

@wenqing Is this implementation using the global vector on each thread?

@wenqing

This comment has been minimized.

Member

wenqing commented May 31, 2017

@endJunction Yes, it is. TODO: : Using PetscScalar* PETScVector::getLocalVector() (call VecGhostGetLocalForm) instead of a scattered global array. To do this, local node indices of partitions have to be added to DOF table, or they are calculated on the fly.

@chleh

Many comments. Most of them regarding whether some setLocalAccessibleVector() calls can be avoided, if you want, we can also discuss that.
Some of the comments might already be obsolete by your subsequent commits.
If you should follow my comments, then it might be necessary to call setLocalAccessibleVector here

@@ -38,6 +38,8 @@ PETScVector::PETScVector(const PetscInt vec_size, const bool is_global_size)
}
config();
_global_v = std::unique_ptr<PetscScalar[]>{ new PetscScalar[_size] };

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

Note: Now you can use std::make_unique<PetscScalar[]>(_size);

This comment has been minimized.

@wenqing

wenqing Jun 8, 2017

Member

Renamed _global_v to _entry_array, and changed its type to std::vector.

return value;
const PetscInt id_p = (idx == -_size) ? 0 : std::abs(idx);
//VecGetValues(_v, 1, &id_p, &value);
return _global_v[id_p];

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

Please add a Doxygen comment that now the global vector must be available before using this method.

This comment has been minimized.

@wenqing

wenqing Jun 8, 2017

Member

Added.

and its associated computations can be dropped if
VecGetValues can get values from different processors.
*/
std::unique_ptr<PetscScalar[]> _global_v = nullptr;

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

Minor: = nullptr is not needed with unique_ptr.

This comment has been minimized.

@wenqing

wenqing Jun 8, 2017

Member

--> mutable std::vector<PetscScalar> _entry_array;

VecGetValues(_v, 1, &idx, &value);
return value;
const PetscInt id_p = (idx == -_size) ? 0 : std::abs(idx);
//VecGetValues(_v, 1, &id_p, &value);

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

I'd remove the commented out code.

This comment has been minimized.

@wenqing

wenqing Jun 8, 2017

Member

removed.

PetscScalar x;
VecGetValues(_v, 1, &idx, &x);
return x;
//PetscScalar x;

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

please remove the commented out code.

This comment has been minimized.

@wenqing

wenqing Jun 8, 2017

Member

Removed.

@@ -589,7 +589,7 @@ bool UncoupledProcessesTimeLoop::loop()
for (auto& spd : _per_process_data)
{
auto& pcs = spd->process;
auto const& x0 = *_process_solutions[pcs_idx];
auto& x0 = *_process_solutions[pcs_idx];

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

sorry, I don't see why this is not const anymore. At least preTimestep() and doOutput() seem to not have changed in this PR.

@@ -638,7 +638,7 @@ bool UncoupledProcessesTimeLoop::loop()
for (auto& spd : _per_process_data)
{
auto& pcs = spd->process;
auto const& x = *_process_solutions[pcs_idx];
auto& x = *_process_solutions[pcs_idx];

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

Can it be kept const?

// The function only has computation if DDC is appied,
// e.g. Parallel comuting.
MathLib::LinAlg::setLocalAccessibleVector(x);

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

Isn't that already done in the nonlinear solver?

// The function only has computation if DDC is appied,
// e.g. Parallel comuting.
MathLib::LinAlg::setLocalAccessibleVector(x);

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

Likewise: Isn't that already done in the nonlinear solver?

@@ -522,6 +522,7 @@ bool UncoupledProcessesTimeLoop::setCoupledSolutions()
const std::size_t c_id =
std::distance(_per_process_data.begin(), found_item);
MathLib::LinAlg::setLocalAccessibleVector(*_process_solutions[c_id]);

This comment has been minimized.

@chleh

chleh May 31, 2017

Contributor

Is that necessary? Isn't that already taken care of by the nonliner solver or by setInitialConditions() ?

@wenqing wenqing added WIP 🏗 and removed please review labels May 31, 2017

@TomFischer

This comment has been minimized.

Member

TomFischer commented Jun 1, 2017

I used the commits in my branch and some test are failing. For this reason I checked out this branch and the MPITest_Math.CheckInterface_PETScVector is failing already with the changes of this PR.

mpirun -np 3 valgrind --tool=memcheck bin/testrunner --gtest_filter=MPITest_Math.CheckInterface_PETScVector

Valgrind reports and invalid read at PETScVector.h in line 192. I wonder why the checks are passed.

@wenqing

This comment has been minimized.

Member

wenqing commented Jun 1, 2017

@TomFischer Thanks. I will check it.

@wenqing

This comment has been minimized.

Member

wenqing commented Jun 1, 2017

@TomFischer Tests are corrected now.

@wenqing

This comment has been minimized.

Member

wenqing commented Jun 1, 2017

@TomFischer I am changing this PR by using local array and also by Christoph's comments. You can use commit a145d53 to make the tests of your branch run.

// a rank in the parallel computing.
OGS_FATAL("Global index in the system of equations"
" can only be numbered by the oder type"
" of ComponentOrder::BY_LOCATION");

This comment has been minimized.

@endJunction

endJunction Jun 8, 2017

Member

minor: One could reorder the if and else parts for early exit...

if (order != BY_LOCATION)
{
    OGS_FATAL("....")
}
global_id = ...
is_ghost = ..

This comment has been minimized.

@wenqing

wenqing Jun 9, 2017

Member

Took the suggestion.

if (_global_ids2local_ids_ghost.size() > 0)
{
double* loc_x = getLocalVector();

This comment has been minimized.

@endJunction

endJunction Jun 8, 2017

Member

minor: Is this a double* or PETCsScalar*?

This comment has been minimized.

@wenqing

wenqing Jun 9, 2017

Member

Changed all double to PETCsScalar in this class.

@endJunction

Some minor questions, but fine in general.
Very nice, that this bug is fixed!

/**
Set local accessible vector in order to get entries. Call this
before call operator[] or get(...) of x.
The function only has computation if DDC is appied,

This comment has been minimized.

@TomFischer

TomFischer Jun 9, 2017

Member

Spelling: appied -> applied or enabled ?

This comment has been minimized.

@wenqing

wenqing Jun 9, 2017

Member

Changed to 'enabled'.

e.g. parallel computing. Up to now, Eigen vector is not used for
global vectors in parallel computing. Therefore this function is
empty in the current status.
*/

This comment has been minimized.

@TomFischer

TomFischer Jun 9, 2017

Member

Maybe you could mention that the purpose of this overloaded function is to fulfil the lin alg interface (is this correct?). For this reason, it has an empty implementation.

This comment has been minimized.

@wenqing

wenqing Jun 9, 2017

Member

I would like to leave it like it is because we may want another parallel approach without using PETSc solver. In the case of such parallel computing case without PETSc solver and using Eigen vector type, this function can be filled.

_size_loc {other._size_loc},
_size_ghosts {other._size_ghosts},
_has_ghost_id {other._has_ghost_id},
_global_ids2local_ids_ghost {other._global_ids2local_ids_ghost}
{}

This comment has been minimized.

@TomFischer

TomFischer Jun 9, 2017

Member

Clang format looks like this:

PETScVector::PETScVector(PETScVector&& other)
    : _v{std::move(other._v)},
      _v_loc{std::move(other._v_loc)},
      _start_rank{other._start_rank},
      _end_rank{other._end_rank},
      _size{other._size},
      _size_loc{other._size_loc},
      _size_ghosts{other._size_ghosts},
      _has_ghost_id{other._has_ghost_id}

This comment has been minimized.

@wenqing

wenqing Jun 9, 2017

Member

Will be changed later.

std::vector<double> get(std::vector<IndexType> const& indices) const
{
std::vector<double> local_x(indices.size());
// TODO preliminary

This comment has been minimized.

@TomFischer

TomFischer Jun 9, 2017

Member

Is this comment needed. If yes could you please explain a little bit more in detail what have to be done?

This comment has been minimized.

@wenqing

wenqing Jun 9, 2017

Member

The was added by Christoph. Let's just leave it here temporarily.

@@ -53,10 +54,11 @@ GlobalVector const& GlobalVectorFromNamedFunction::call(
_context.index = node_id;
auto const value = _function_caller.call(args);
// TODO Problems with PETSc? (local vs. global index)
// Problems with PETSc also uses global index.

This comment has been minimized.

@TomFischer

TomFischer Jun 9, 2017

Member

I don't understand the comment. Please, can you explain it in more detail?

This comment has been minimized.

@wenqing

wenqing Jun 9, 2017

Member

Dropped. This just answers the deleted comment (original line 56), and I should write it as a comment in this PR.

@TomFischer

Some small issues to discuss, after that: 👍

@wenqing

This comment has been minimized.

Member

wenqing commented Jun 9, 2017

Thanks for the comments from all of you.

@wenqing

This comment has been minimized.

Member

wenqing commented Jun 9, 2017

The last commit corrects test StaggeredTH_ThermalDensityDrivenFlow2D. Now all tests on jenkins should be green.

@wenqing

This comment has been minimized.

Member

wenqing commented Jun 9, 2017

Tests on jenkins (in background) are finished without error. This PR is going to be merged soon.

@wenqing wenqing force-pushed the wenqing:petsc_fixing branch from 7a87285 to b17a217 Jun 9, 2017

@wenqing wenqing removed the please review label Jun 9, 2017

@wenqing wenqing merged commit e203766 into ufz:master Jun 9, 2017

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/jenkins/pr-merge This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment