Skip to content

Conversation

@njnu-seafish
Copy link
Contributor

Purpose of the pull request

close #17758

Brief change log

Add logic to handle tasks that fail during initialization.

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

@SbloodyS SbloodyS changed the title [Fix-17758] [Master]Add logic to handle tasks that fail during initialization. [Fix-17758][Master] Add logic to handle tasks that fail during initialization. Dec 3, 2025
@SbloodyS SbloodyS added the bug Something isn't working label Dec 3, 2025
@SbloodyS SbloodyS added this to the 3.4.0 milestone Dec 3, 2025
Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should fire a TaskFailedLifecycleEvent when task initialize failed.

@njnu-seafish
Copy link
Contributor Author

Should fire a TaskFailedLifecycleEvent when task initialize failed.

yes, add logic to handle TaskFatalLifecycleEvent

@AllArgsConstructor
public class TaskFatalLifecycleEvent extends AbstractTaskLifecycleEvent {

private final ITaskExecutionRunnable taskExecutionRunnable;

Check notice

Code scanning / CodeQL

Missing Override annotation Note

This method overrides
AbstractTaskLifecycleEvent.getTaskExecutionRunnable
; it is advisable to add an Override annotation.
@github-actions github-actions bot added the test label Dec 4, 2025
Comment on lines +114 to +124
try {
taskExecutionRunnable.initializeTaskExecutionContext();
} catch (Exception ex) {
log.error("Current taskInstance: {} initializeTaskExecutionContext error", taskInstance.getName(), ex);
final TaskFatalLifecycleEvent taskFatalEvent = TaskFatalLifecycleEvent.builder()
.taskExecutionRunnable(taskExecutionRunnable)
.endTime(new Date())
.build();
taskExecutionRunnable.getWorkflowEventBus().publish(taskFatalEvent);
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to create a TaskFatalException, and throw TaskFatalException here and handle TaskFatalException at WorkflowEventBusFireWorker.

@ruanwenjun ruanwenjun requested a review from Copilot December 5, 2025 02:21
Copilot finished reviewing on behalf of ruanwenjun December 5, 2025 02:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new mechanism to handle tasks that fail during initialization by introducing a TaskFatalLifecycleEvent. The implementation distinguishes fatal initialization failures from regular task failures, ensuring that unrecoverable errors (like missing environment configurations) do not trigger retry logic and are handled appropriately in the workflow execution.

  • Adds TaskFatalLifecycleEvent and its handler to manage initialization failures
  • Updates task state machine to handle fatal events and properly fail workflows or continue if all successors are condition tasks
  • Adds comprehensive integration tests covering fatal task scenarios with condition tasks

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
TaskFatalLifecycleEvent.java New lifecycle event class representing fatal unrecoverable errors during task initialization
TaskFatalLifecycleEventHandler.java Handler that processes fatal events and delegates to state actions
TaskLifecycleEventType.java Added FATAL enum value for the new event type
ITaskStateAction.java Added onFatalEvent interface method for state actions
AbstractTaskStateAction.java Default implementation of fatal event handling with workflow graph logic
TaskSubmittedStateAction.java Added try-catch around initializeTaskExecutionContext to publish fatal events on exceptions
WorkflowStartTestCase.java Added three new integration tests for fatal task scenarios
workflow_with_one_fake_task_fatal.yaml Test fixture for single fatal task scenario
workflow_with_one_condition_task_with_one_fake_predecessor_fatal.yaml Test fixture for condition task with fatal predecessor
workflow_with_one_forbidden_condition_task_with_one_fake_predecessor_fatal.yaml Test fixture for forbidden condition task with fatal predecessor
workflow_with_one_forbidden_condition_task_with_one_fake_predecessor_failed.yaml Updated description for clarity to indicate the condition task is forbidden

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

taskType: LogicFakeTask
taskParams: '{"localParams":null,"varPool":[],"shellScript":"echo success"}'
workerGroup: default
environmentCode : 144873539254368
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent spacing around colon. The key environmentCode should be followed by a colon without a space before it, consistent with other YAML files in this directory. Change environmentCode : to environmentCode:.

Suggested change
environmentCode : 144873539254368
environmentCode: 144873539254368

Copilot uses AI. Check for mistakes.
# limitations under the License.
#

# A(fatal-failed) -> B(success) -> D(success)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment incorrectly describes the workflow structure. Based on the taskRelations, the structure is: A(fatal-failed) -> B(Condition) -> C(success) or D(failed). The comment should be updated to: # A(fatal-failed) -> B(Condition) -> C(success) or D(failed).

Suggested change
# A(fatal-failed) -> B(success) -> D(success)
# A(fatal-failed) -> B(Condition) -> C(success) or D(failed)

Copilot uses AI. Check for mistakes.
# limitations under the License.
#

# A(failed) -> B(Condition)(forbidden) -> C(success)
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "A(failed)" but based on the filename and test description, this should be "A(fatal)". The comment should be updated to: # A(fatal) -> B(Condition)(forbidden) -> C(success).

Suggested change
# A(failed) -> B(Condition)(forbidden) -> C(success)
# A(fatal) -> B(Condition)(forbidden) -> C(success)

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +112
final TaskInstance taskInstance = taskExecutionRunnable.getTaskInstance();
taskInstance.setState(TaskExecutionStatus.FAILURE);
taskInstance.setEndTime(taskFatalEvent.getEndTime());
taskInstanceDao.updateById(taskInstance);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] For consistency with other event handlers in this class (e.g., onFailedEvent, onPausedEvent, onKilledEvent), consider extracting the persistence logic into a private helper method like persistentTaskInstanceFatalEventToDB. This improves code maintainability and follows the established pattern in this class.

Copilot uses AI. Check for mistakes.
@AllArgsConstructor
public class TaskFatalLifecycleEvent extends AbstractTaskLifecycleEvent {

private final ITaskExecutionRunnable taskExecutionRunnable;
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides AbstractTaskLifecycleEvent.getTaskExecutionRunnable; it is advisable to add an Override annotation.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 5, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend bug Something isn't working test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Master] If a task fails during initialization, it will neither be dispatched by the Master nor can it be properly killed.

3 participants