[Feature]:  Split and shorten long CI jobs (e.g. entrypoints, spec decodes, kernels, etc.)

### 🚀 The feature, motivation and pitch
Related to [issues-16284](https://github.com/vllm-project/vllm/issues/16284)

## Analysis of current CI test process problems

The vLLM project currently uses Buildkite as the main CI system, which has the following problems:

1. **Long-running tests**: Some tests run too long, and even 2-hour tests are skipped

2. **Test classification is not detailed enough**: Although there are entry point tests, specification decoding tests, kernel tests and other classifications, there is a lack of effective grouping based on execution time

3. **Insufficient parallelism**: The current CI configuration does not fully utilize the parallel execution capability

## Detailed design plan

### 1. Test layering strategy

**First layer: Fast Check**
- Based on the existing fast_check tag
- Execution time: < 5 minutes
- Includes: basic functional tests, unit tests, fast integration tests

**Second layer: Standard Tests**
- Execution time: 5-20 minutes
- Includes: most functional tests, API tests

**Third layer: Extended Tests**
- Execution time: 20-60 minutes
- Includes: complex scenario testing, performance regression testing

**Layer 4: Nightly Tests**
- Execution time: > 60 minutes
- Includes: large-scale testing, stress testing, complete performance benchmark testing [6](#0-5)

### 2. Split strategy by functional module

**Entry point test module split**
- `entrypoints-llm`: LLM interface testing
- `entrypoints-openai`: OpenAI API compatibility testing
- `entrypoints-offline`: offline mode testing

**Spec decoding test module split**
- `spec-decode-core`: core spec decoding function
- `spec-decode-e2e`: end-to-end testing
- `spec-decode-performance`: performance testing

**Kernel test module split**
- `kernels-attention`: attention mechanism kernel
- `kernels-quantization`: quantization kernel
- `kernels-moe`: Expert Mixture Model Kernel
- `kernels-core`: Core Operation Kernel

### 3. Hardware Platform Parallelization Strategy

Based on the multiple hardware platforms supported by the project:

**GPU test grouping**
- `gpu-single`: Single GPU test (L4, A100, etc.)
- `gpu-multi`: Multi-GPU distributed test
- `gpu-memory`: Large memory requirement test

**CPU test grouping**
- `cpu-x86`: x86 architecture CPU test
- `cpu-arm`: ARM architecture CPU test

**Specialized hardware test**
- `tpu-tests`: TPU platform test
- `neuron-tests`: AWS Neuron test

### 4. Timeout and retry mechanism optimization

Use pytest-timeout function:

- **Quick test**: Timeout 5 minutes, retry 1 time immediately after failure
- **Standard test**: timeout 20 minutes, retry once after failure
- **Extended test**: timeout 60 minutes, retry once after failure
- **Night test**: timeout 180 minutes, no automatic retry after failure, record detailed logs

### 5. Test selection and marking strategy

Extend the existing pytest marking system:

```
Suggested new marks:
- @pytest.mark.fast (< 5min)
- @pytest.mark.standard (5-20min)
- @pytest.mark.extended (20-60min)
- @pytest.mark.nightly (> 60min)
- @pytest.mark.gpu_intensive
- @pytest.mark.memory_heavy
```

### 6. CI pipeline reconstruction plan

**Pull Request triggered pipeline**
1. Fast check layer (parallel execution, 5-8 jobs)
2. Standard test layer (conditional trigger, based on code changes)
3. Critical path testing (always executed)

**Pipeline triggered by master branch merge**

1. Complete fast + standard test

2. Selective extension test

3. Performance regression detection

**Nightly scheduled pipeline**

1. Full test suite

2. Performance benchmark test

3. Stress test and stability test

### 7. Implementation steps

**Phase 1: Test analysis and tagging**

1. Analyze the execution time distribution of existing tests

2. Add time stratification tags to all tests

3. Identify long tests that can be skipped or optimized

**Phase 2: Pipeline refactoring**

1. Modify Buildkite configuration to increase parallel jobs

2. Implement conditional trigger mechanism

3. Configure timeout and retry strategy

**Phase 3: Monitoring and optimization**

1. Establish CI execution time monitoring dashboard

2. Continuously optimize test grouping and parallelism

3. Adjust strategy based on feedback

### Alternatives

_No response_

### Additional context

solutions: https://docs.google.com/document/d/1gHMT8ZfNqpu67KrJ3DaNeC-mmdPmBRStK_9-g06hONs/edit?usp=sharing

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Split and shorten long CI jobs (e.g. entrypoints, spec decodes, kernels, etc.) #20218

🚀 The feature, motivation and pitch

Analysis of current CI test process problems

Detailed design plan

1. Test layering strategy

2. Split strategy by functional module

3. Hardware Platform Parallelization Strategy

4. Timeout and retry mechanism optimization

5. Test selection and marking strategy

6. CI pipeline reconstruction plan

7. Implementation steps

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Split and shorten long CI jobs (e.g. entrypoints, spec decodes, kernels, etc.) #20218

Description

🚀 The feature, motivation and pitch

Analysis of current CI test process problems

Detailed design plan

1. Test layering strategy

2. Split strategy by functional module

3. Hardware Platform Parallelization Strategy

4. Timeout and retry mechanism optimization

5. Test selection and marking strategy

6. CI pipeline reconstruction plan

7. Implementation steps

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions