Skip to content

FIX: Fix incorrect type castings and mismatched operators #2585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

david-cortes-intel
Copy link
Contributor

Description

This PR fixes some issues in the logic of daal4py objects in which pointers were being casted incorrectly and operator delete was used incorrectly.

Note that this only avoids running into undefined behavior, but it does not fix memory leaks, which are left for a follow-up PR.


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

@david-cortes-intel david-cortes-intel requested a review from Vika-F July 4, 2025 09:47
@david-cortes-intel david-cortes-intel added the bug Something isn't working label Jul 4, 2025
@david-cortes-intel
Copy link
Contributor Author

/intelci: run

Copy link

codecov bot commented Jul 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag Coverage Δ
azure 79.81% <ø> (ø)
github 72.51% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@icfaust icfaust requested a review from Copilot July 4, 2025 11:38
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates pointer type signatures in Cython and refactors daal4py’s shared/raw pointer handling by replacing manual free functions and base classes with templated header-only functions.

  • Corrects Cython function signatures in gettree.pyx to use skl_tree_node* and double* instead of void*.
  • Refactors daal4py.h to introduce header-only templated functions (delete_daal_shared_ptr, set_sp_base, rawp_free_cap, set_rawp_base) and removes the old VSP/TVSP classes and free functions.
  • Cleans up daal4py.cpp by removing obsolete pointer free implementations.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/gettree.pyx Updated extern declarations and removed manual casts for node/value pointers.
src/daal4py.h Replaced custom VSP/TVSP and free functions with inline templates.
src/daal4py.cpp Deleted deprecated free function definitions for raw and shared pointers.
Comments suppressed due to low confidence (1)

src/daal4py.h:282

  • Template functions defined in a header should be marked inline to avoid potential multiple-definition linker errors when included in multiple translation units. Consider adding inline to delete_daal_shared_ptr (and the other templates).
template <class T>

src/daal4py.h Outdated
virtual ~TVSP() {};
};
daal::services::SharedPtr<T> * shared_ptr_for_capsule = new daal::services::SharedPtr<T>();
*shared_ptr_for_capsule = std::move(sp);
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using std::move here transfers ownership out of the original shared_ptr, leaving the caller’s sp empty. To preserve expected semantics (a copy), assign sp directly instead of moving: *shared_ptr_for_capsule = sp;

Suggested change
*shared_ptr_for_capsule = std::move(sp);
*shared_ptr_for_capsule = sp;

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@icfaust icfaust Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bad recommendation due to the nature of the shared pointer. We want the capsule to own it, and we don't want the ref count to increase. If that's the case, could you leave a comment to this effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a move assignment operator defined in daal::services::SharedPtr in order to let this line work as expected?
Because for now daal::services::SharedPtr doesn't have a move assignment operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use initialization from another shared pointer instead of move. It has the downside that there's now a slight inefficiency in that a shared pointer is created and the old one destroyed immediately after the return of this function, but that shouldn't have any noticeable impact.

@Vika-F But what's the idea behind using a custom shared pointer class instead of just std::shared_ptr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry. I misread the changes.
Regarding own implementation of shared_ptr, there are two reasons:

  • it is a legacy part. Originally DAAL was C++03 compatible.
  • there might be some issue with illegal instruction due to compiling std::shared_pointer multiple times with various ISA optimizations, but I am not sure that it still remains.

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have the impacts a bit more detailed in the description. To my mind there are two issues with respect to undefined behavior:

  1. misuse of void in deleters, which is impacting return data from daal4py generally (all data is doing undefined behavior at deletion).
  2. Misuse of array generation in gettree, which impacts only the daal4py ensemble algorithms (which are already been replaced in sklearnex by a onedal interface) through the use of void*.

src/daal4py.h Outdated
virtual ~TVSP() {};
};
daal::services::SharedPtr<T> * shared_ptr_for_capsule = new daal::services::SharedPtr<T>();
*shared_ptr_for_capsule = std::move(sp);
Copy link
Contributor

@icfaust icfaust Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bad recommendation due to the nature of the shared pointer. We want the capsule to own it, and we don't want the ref count to increase. If that's the case, could you leave a comment to this effect?

@@ -98,8 +98,8 @@ cdef class pyTreeState(object):
self.node_count = treeState.node_count
self.leaf_count = treeState.leaf_count
self.class_count = treeState.class_count
self.node_ar = self._get_node_ndarray(<void*> treeState.node_ar, treeState.node_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the cast to void* is now occurring in PyArray_SimpleNewFromData, rather than a function we define. I assume this is for UBSan to not complain? (I assume this is related to UBSan, if so it may be good to directly reference this in the description for posterity).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried running ubsan on this, but either way, static-casting a void pointer to a type other than the one it was casted from is undefined behavior, even if that type is a parent class of the type from which the void pointer was casted.

See cppreference:
https://en.cppreference.com/w/cpp/language/pointer.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry, I am onboard with the changes and know why this was not kosher. I guess I was trying to get at how you came about it (i.e. if you'd use a specific tool) such that future development would be aware of what has been done and what approaches have been taken in analyzing the codebase.

@icfaust
Copy link
Contributor

icfaust commented Jul 4, 2025

Also merge main to fix the CI failures.

@icfaust
Copy link
Contributor

icfaust commented Jul 4, 2025

private CI is acceptable.

@icfaust
Copy link
Contributor

icfaust commented Jul 7, 2025

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay. Typical unrelated failures in private CI, thanks for the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants