Skip to content

[Bug] ClassNotFoundException when loading user-defined resource classes from uploaded JARs #515

@klaudworks

Description

@klaudworks

Summary

When flink-agents-dist.jar is deployed in /opt/flink/lib (which is required), user-defined resource classes (e.g., custom ChatModel implementations) cannot be loaded from user JARs uploaded via the REST API, resulting in ClassNotFoundException.

Error Message

java.lang.ClassNotFoundException: com.example.AzureOpenAIChatModelSetup
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Unknown Source)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Unknown Source)
    at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Unknown Source)
    at org.apache.flink.agents.plan.resourceprovider.JavaResourceProvider.provide(JavaResourceProvider.java:40)

Root Cause

The framework code in /opt/flink/lib is loaded by the System ClassLoader. User JARs uploaded at runtime are loaded by Flink's User ClassLoader (a child of the System ClassLoader).

The existing code uses Class.forName(className) which defaults to the caller's classloader (System ClassLoader). Due to Java's parent-first delegation model, the System ClassLoader cannot see classes in its child classloaders.

Affected locations:

  • JavaResourceProvider.java - main resource instantiation
  • JavaSerializableResourceProvider.java - serializable resource deserialization
  • AgentPlan.java - PythonResourceWrapper class checks
  • ActionJsonDeserializer.java - parameter type and config deserialization
  • FunctionToolJsonDeserializer.java - parameter type deserialization
  • EventLogRecordJsonDeserializer.java - event class deserialization

Solution

Use the Thread Context ClassLoader (TCCL) instead:

Class.forName(className, true, Thread.currentThread().getContextClassLoader())

Flink sets the TCCL to the User ClassLoader before executing user code, making user-defined classes accessible to framework code.

Workaround

Place user-defined resource classes in /opt/flink/lib alongside flink-agents-dist.jar. However, this is inconvenient for deployment scenarios where the platform cannot anticipate what users will run (e.g., would require rebuilding Docker images for each custom resource).

Fix

PR #514

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions