issues Search Results · repo:kubeflow/trainer language:Python
Filter by
1k results
(60 ms)1k results
inkubeflow/trainer (press backspace or delete to remove)What you would like to be added?
Add a GitHub action workflow for:
- Publishing helm charts to GHCR, so user can install the trainer Helm chart by specifying OCI image repository and
version:
...
kind/feature
lifecycle/needs-triage
ChenYi015
- Opened 2 days ago
- #2488
What happened?
During testing, I noticed the following error when running list_runtimes() APIs:
File /Users/avelichk/go/src/github.com/kubeflow/trainer/sdk/kubeflow/trainer/models/trainer_v1alpha1_ml_policy.py ...
area/sdk
kind/bug
andreyvelich
- 1
- Opened 3 days ago
- #2485
https://github.com/kubeflow/trainer/blob/master/CONTRIBUTING.md paths like:
kubectl apply --server-side -k github.com/kubeflow/training-operator/manifests/overlays/standalone
does not exist
Okabe-Rintarou-0
- 3
- Opened 3 days ago
- #2480
What happened?
As we found in our E2E tests, the ClusterTrainingRunime factory randomly fails in the following.
{ level : error , ts : 2025-03-05T19:26:42.084342655Z , caller : runtime/signal_unix.go:917 ...
kind/bug
tenzen-y
- 1
- Opened 4 days ago
- #2477
What you would like to be added?
As I mentioned in
https://github.com/kubeflow/trainer/blob/3ec8f0705f515269b5ab8744c20b9d085f50d1ce/pkg/runtime/framework/core/framework_test.go#L51-L53,
it would be better ...
area/controller
kind/feature
tenzen-y
- 2
- Opened 6 days ago
- #2468
What you would like to be added?
Since we updated JobSet to v0.8.0, we should refactor the controller code to support DependsOn API for the Initializer
Job and MPI orchestration.
/assign @andreyvelich ...
area/controller
kind/feature
andreyvelich
- Opened 6 days ago
- #2467
What you would like to be added?
We should explore the uv project manager for the Kubeflow Python SDK. It is faster than other tools, and many Python
libraries have started adopting it.
In particular, ...
area/sdk
kind/discussion
kind/feature
andreyvelich
- 2
- Opened 9 days ago
- #2462
What happened?
Flaky Integration Test: TestDatasetIntegration.test_dataset_download[HuggingFace - Public
dataset-huggingface-test_case0]
- https://github.com/kubeflow/trainer/actions/runs/13595830014/job/38012401152 ...
area/testing
good first issue
help wanted
kind/bug
tenzen-y
- 7
- Opened 9 days ago
- #2460
What you would like to be added?
It would be great to reconsider the TrainJob Created condition. The tenantavely alternative candidate is Initialized and
ComponentsCreated as we discussed in https://github.com/kubeflow/trainer/pull/2439#discussion_r1959297527. ...
kind/feature
tenzen-y
- Opened 9 days ago
- #2459
What you would like to be added?
It would be great to add Kubeflow TrainerPipelineFramework documentations to
https://www.kubeflow.org/docs/components/trainer/operator-guides/
Why is this needed?
We ...
area/docs
good first issue
help wanted
kind/documentation
tenzen-y
- 7
- Opened 9 days ago
- #2458

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.