Building Intelligent Document Processing with Apache Camel: Docling meets LangChain4j

15 Oct 2025

In the rapidly evolving landscape of AI-powered applications, the ability to process and understand documents has become increasingly crucial. Whether you’re dealing with PDFs, Word documents, or PowerPoint presentations, extracting meaningful insights from unstructured data is a challenge many developers face daily.

In this post, we’ll explore how Apache Camel’s new AI components enable developers to build sophisticated RAG (Retrieval Augmented Generation) pipelines with minimal code. We’ll combine the power of Docling for document conversion with LangChain4j for AI orchestration, all orchestrated through Camel’s YAML DSL.

The Challenge: Document Intelligence at Scale

Companies are drowning in documents. Legal firms process contracts, healthcare providers manage medical records, and financial institutions analyze reports. The traditional approach of manual document review simply doesn’t scale.

So this a possible space where we could apply RAG and Apache Camel. The steps:

Convert documents from any format to structured text
Extract key insights and summaries
Answer questions about document content
Process documents in real-time as they arrive

This is where the combination of Docling and LangChain4j shines, and Apache Camel provides the perfect integration layer to bring them together.

Meet the Components

Camel-Docling: Enterprise Document Conversion

The camel-docling component integrates IBM’s Docling library, an AI-powered document parser that can handle various formats including PDF, Word, PowerPoint, and more. What makes Docling special is its ability to preserve document structure while converting to clean Markdown, HTML, or JSON.

Key features:

Multiple Operations: Convert to Markdown, HTML, JSON, or extract structured data
Flexible Deployment: Works with both CLI and API (docling-serve) modes
Content Control: Return content directly in the message body or as file paths
OCR Support: Handle scanned documents with optical character recognition

Camel-LangChain4j: AI Orchestration Made Simple

The camel-langchain4j-chat component provides seamless integration with Large Language Models through the LangChain4j framework. It supports various LLM providers including OpenAI, Ollama, and more.

Perfect for:

Document analysis and summarization
Question-answering systems
Content generation
RAG implementations

Building a RAG Pipeline with YAML

Let’s walk through a complete example that demonstrates the power of combining these components. Our goal is to create a system that automatically processes documents, analyzes them with AI, and generates comprehensive reports: a classic example.

Architecture Overview

The flow is straightforward:

Watch a directory for new documents
Convert documents to Markdown using Docling
Send the converted content to an LLM for analysis
Generate a comprehensive analysis report
Clean up processed files

All of this is defined declaratively in YAML, making it easy to understand and modify.

Setting Up the Infrastructure

First, we need our services running. Thanks to camel infra command, this is pretty simple:

# Start Docling (if camel infra supports it)
$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel infra run docling

# Start Ollama (if camel infra supports it)
$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel infra run ollama

Or we could use docker

# Start Docling-Serve
$ docker run -d -p 5001:5001 --name docling-serve ghcr.io/docling-project/docling-serve:latest

# Start Ollama
$ docker run -d -p 11434:11434 --name ollama ollama/ollama:latest

# Pull orca-mini model
$ docker exec -it ollama ollama pull orca-mini

We could also use docker-compose:

$ docker compose up -d
$ docker exec -it ollama ollama pull orca-mini

Configuring the Chat Model

We use a Groovy script bean to configure our LangChain4j chat model:

- beans:
  - name: chatModel
    type: "#class:dev.langchain4j.model.ollama.OllamaChatModel"
    scriptLanguage: groovy
    script: |
      import dev.langchain4j.model.ollama.OllamaChatModel
      import static java.time.Duration.ofSeconds

      return OllamaChatModel.builder()
        .baseUrl("{{ollama.base.url}}")
        .modelName("{{ollama.model.name}}")
        .temperature(0.3)
        .timeout(ofSeconds(120))
        .build()

Notice how we use property placeholders ({{ollama.base.url}}) which Camel automatically resolves. This makes the configuration flexible and environment-agnostic.

The Main RAG Route

Here’s where the magic happens. The route watches for documents, processes them through Docling, and analyzes them with our LLM:

- route:
    id: document-analysis-workflow
    from:
      uri: file:{{documents.directory}}
      parameters:
        include: ".*\\.(pdf|docx|pptx|html|md)"
        noop: true
        idempotent: true
      steps:
        - log: "Processing document: ${header.CamelFileName}"

        # Convert GenericFile to file path
        - setBody:
            simple: "${body.file.absolutePath}"

        # Convert to Markdown
        - to:
            uri: docling:CONVERT_TO_MARKDOWN
            parameters:
              useDoclingServe: true
              doclingServeUrl: "{{docling.serve.url}}"
              contentInBody: true

        # Prepare AI prompt
        - setBody:
            simple: |
              You are a helpful document analysis assistant. Please analyze
              the following document and provide:
              1. A brief summary (2-3 sentences)
              2. Key topics and main points
              3. Any important findings or conclusions

              Document content:
              ${exchangeProperty.convertedMarkdown}

        # Get AI analysis
        - to:
            uri: langchain4j-chat:analysis
            parameters:
              chatModel: "#chatModel"

Interactive Q&A API

We also provide an HTTP endpoint for asking questions about documents:

- route:
    id: document-qa-api
    from:
      uri: platform-http:/api/ask
      steps:
        # Find latest document
        # Convert with Docling
        # Prepare RAG prompt with user question
        # Get answer from LLM

This enables interactive workflows:

$ curl -X POST http://localhost:8080/api/ask \
  -d "What are the main topics in this document?"

Future Enhancements

Possible developments could be:

Vector Storage Integration: Combine with camel-langchain4j-embeddings to store document chunks in vector databases for more sophisticated retrieval.
Multi-Model Workflows: Use different models for different tasks - fast models for classification, powerful models for analysis.
Streaming Responses: For long documents, stream LLM responses back to the client as they’re generated.
Custom Tools: Integrate camel-langchain4j-tools to give the LLM access to external data sources.

Try It Yourself

The complete example is available in the Apache Camel repository under camel-jbang-examples/docling-langchain4j-rag. To run it:

$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel run \
  --fresh \
  --dep=camel:docling \
  --dep=camel:langchain4j-chat \
  --dep=camel:platform-http \
  --dep=dev.langchain4j:langchain4j:1.6.0 \
  --dep=dev.langchain4j:langchain4j-ollama:1.6.0 \
  --properties=application.properties \
  docling-langchain4j-rag.yaml

Don’t forget to copy the sample.md into the documents directory!

Watch the logs as your document is processed, analyzed, and cleaned up automatically!

Conclusion

The combination of Apache Camel’s integration capabilities, Docling’s document conversion power, and LangChain4j’s AI orchestration creates a compelling platform for building intelligent document processing systems.

What makes this especially powerful is the declarative nature of the solution. The entire workflow is defined in ~350 lines of readable YAML, making it easy to understand, modify, and extend.

We’d love to hear about what you build with these components. Share your experiences on the Apache Camel mailing list or join us on Zulip chat!

Stay tuned for more examples combining Camel’s growing AI component ecosystem. The future of integration is intelligent, and we’re just getting started.

Happy integrating!

Load properties from Vault/Secrets cloud services: introducing Camel Context automatic refresh on secrets updates

20 Dec 2022

Starting from Camel 3.19.0 we have four cloud services supported for loading properties as secrets:

AWS Secret Manager
Google Cloud Secret Manager
Azure Key Vault
Hashicorp Vault

One of the problems we faced in the development was related to finding a way to automatically refresh the secret value on the secrets update.

The main players in the cloud game are providing solutions based on their services: AWS provides multiple ways to be notified about secret updates and secret rotations through AWS Cloudtrail or AWS Cloud events, GCP leverages Google Pubsub to deliver messages with events related to secret, while Azure provides multiple ways of getting notified about events related to a vault in the Azure Key Vault service, mostly by using Azure Eventgrid as an intermediate service. Hashicorp Vault as of today doesn’t provide an API to get secrets notification.

Enabling Automatic Camel context reloading after Secrets Refresh

AWS Secrets Manager

The automatic context reloading could be achieved through Camel’s main properties. In particular, the authentication to AWS service (Cloudtrail) could be set through the default credentials provider or through access key/secret key/region credentials. The camel’s main properties are:

camel.vault.aws.refreshEnabled=true 
camel.vault.aws.refreshPeriod=60000 
camel.vault.aws.secrets=Secret 
camel.main.context-reload-enabled = true

where camel.vault.aws.refreshEnabled will enable the automatic context to reload, camel.vault.aws.refreshPeriod is the interval of time between two different checks for update events, and camel.vault.aws.secrets is a regex representing the secrets we want to track for updates. The property camel.vault.aws.secrets is not mandatory: if not specified the task responsible for checking updates events will take into account the properties with an aws: prefix.

The mechanism behind this feature, for what is related to AWS Secrets Manager, involves the AWS Cloudtrail service. The task will search for Secrets in camel properties associated with AWS prefixes and look for events in the Cloudtrail entries related to them. Once the task will find an update operation it will trigger the context reloading.

At the following URL, we provide a simple example through camel-jbang: AWS Secrets Manager Example

Google Secret Manager

The automatic context reloading could be achieved through Camel’s main properties. In particular, the authentication to Google service (Pubsub) could be set through the default google instance or through the service account key file. The camel’s main properties are:

camel.vault.gcp.projectId= projectId 
camel.vault.gcp.refreshEnabled=true 
camel.vault.gcp.refreshPeriod=60000 
camel.vault.gcp.secrets=hello* 
camel.vault.gcp.subscriptionName=subscriptionName 
camel.main.context-reload-enabled = true

where camel.vault.gcp.refreshEnabled will enable the automatic context reloading, camel.vault.gcp.refreshPeriod is the interval of time between two different checks for update events, and camel.vault.gcp.secrets is a regex representing the secrets we want to track for updates. The property camel.vault.gcp.secrets is not mandatory: if not specified the task responsible for checking updates events will take into account the properties with a gcp: prefix. The camel.vault.gcp.subscriptionName is the subscription name created about the Google PubSub topic associated with the tracked secrets.

This mechanism while making use of the notification system related to Google Secret Manager: through this feature, every secret could be associated with one up to ten Google Pubsub Topics. These topics will receive events related to the life cycle of the secret.

At the following URL, we provide a simple example through camel-jbang: Google Secret Manager Example

Azure Key Vault

The automatic context reloading could be achieved through Camel’s main properties. In particular, the authentication to Azure service (Storage Blob) could be set through client id/client secret/tenant id. The camel’s main properties are:

camel.vault.azure.refreshEnabled=true 
camel.vault.azure.refreshPeriod=60000 
camel.vault.azure.secrets=Secret 
camel.vault.azure.eventhubConnectionString=eventhub_conn_string 
camel.vault.azure.blobAccountName=blob_account_name 
camel.vault.azure.blobContainerName=blob_container_name 
camel.vault.azure.blobAccessKey=blob_access_key 
camel.main.context-reload-enabled = true

where camel.vault.azure.refreshEnabled will enable the automatic context to reload, camel.vault.azure.refreshPeriod is the interval of time between two different checks for update events, and camel.vault.azure.secrets is a regex representing the secrets we want to track for updates. The property camel.vault.azure.eventhubConnectionString is the eventhub connection string to get notifications from, camel.vault.azure.blobAccountName, camel.vault.azure.blobContainerName, and camel.vault.azure.blobAccessKey are the Azure Storage Blob parameters for the checkpoint store needed by Azure Eventhub. The property camel.vault.azure.secrets is not mandatory: if not specified the task responsible for checking updates events will take into account the properties with an azure: prefix.

At the following URL, we provide a simple example through camel-jbang: Azure Key Vault Example

A real example

While working on the feature I tried to set up a real use-case scenario. In my case, I’ve been trying to update a database administrator password while a running camel integration was querying the database.

This is well explained in this example the reader could run through camel-jbang: MySQL Database Password Refresh Example

Future

In the next Camel development for what concerns the secret updates feature, we would like to provide the ability to select a different kinds of tasks/policies to trigger a context reloading. For example, we would like to support the secret rotations events coming from AWS Services supporting the rotation. This is in our roadmap.

If you find this feature useful and you feel there is something to improve, don’t hesitate to contact us via the usual channels: Camel Support

Thanks for your attention and see you online.

Load properties from Vault/Secrets cloud services: an update

26 Jul 2022

In Camel 3.16.0 we introduced the ability to load properties from vault and use them in the Camel context.

This post aims to show the updates and improvements we’ve done in the last two releases.

Supported Services

In 3.16.0 we’re supporting two of the main services available in the cloud space:

AWS Secret Manager
Google Cloud Secret Manager

In 3.19.0, to be released, we’re going to have four services available:

AWS Secret Manager
Google Cloud Secret Manager
Azure Key Vault
Hashicorp Vault

Setting up the Properties Function

Each of the Secret management cloud services require different parameters to complete authentication and authorization.

For both the Properties Functions currently available we provide two different approaches:

Environment variables
Main Configuration properties

You already have the information for AWS and GCP in the old blog post.

Let’s explore Azure Key Vault and Hashicorp Vault.

AWS Secrets Manager

The Azure Key Vault Properties Function configurations through enviroment variables are the following:

export $CAMEL_VAULT_AZURE_TENANT_ID=tenantId
export $CAMEL_VAULT_AZURE_CLIENT_ID=clientId
export $CAMEL_VAULT_AZURE_CLIENT_SECRET=clientSecret
export $CAMEL_VAULT_AZURE_VAULT_NAME=vaultName

While as Main Configuration properties it is possible to define the credentials through the following:

camel.vault.azure.tenantId = accessKey
camel.vault.azure.clientId = clientId
camel.vault.azure.clientSecret = clientSecret
camel.vault.azure.vaultName = vaultName

To recover a secret from azure you might run something like:

<camelContext>
    <route>
        <from uri="direct:start"/>
        <to uri="{{azure:route}}"/>
    </route>
</camelContext>

Hashicorp Vault

The Hashicorp Vault Properties Function configurations through enviroment variables are the following:

export $CAMEL_VAULT_HASHICORP_TOKEN=token
export $CAMEL_VAULT_HASHICORP_ENGINE=secretKey
export $CAMEL_VAULT_HASHICORP_HOST=host
export $CAMEL_VAULT_HASHICORP_PORT=port
export $CAMEL_VAULT_HASHICORP_SCHEME=http/https

While as Main Configuration properties it is possible to define the credentials through the following:

camel.vault.hashicorp.token = token
camel.vault.hashicorp.engine = engine
camel.vault.hashicorp.host = host
camel.vault.hashicorp.port = port
camel.vault.hashicorp.scheme = scheme

To recover a secret from Hashicorp Vault you might run something like:

<camelContext>
    <route>
        <from uri="direct:start"/>
        <to uri="{{hashicorp:route}}"/>
    </route>
</camelContext>

Multi fields Secrets and Default value

As for AWS Secrets Manager and Google Secrets Manager, the multi fields secrets and default value are both supported by Azure Key Vault and Hashicorp Vault Properties functions.

Versioning

In the next Camel version we are going to release the support for recovering a secret with a particular version. This will be supported by all the vault we currently support in Camel.

In particular you’ll be able to recover a specific version of a secrets with the following syntax.

<camelContext>
    <route>
        <from uri="direct:start"/>
        <log message="Username is {{hashicorp:database/username:admin@2}}"/>
    </route>
</camelContext>

In this example we’re going to recover the field username from the secret database, with version “2”. In case the version is not available, we’re going to have a default value of ‘admin’.

Future

We plan to work on the ability to reload the whole context once a secret has been rotated or updated. This is something still in the design phase, but we really would like to see it implemented soon.

Stay tuned for more news!

Camel 3.16.0 new feature: Load properties from Vault/Secrets cloud services

24 Mar 2022

In the last weeks, together with Claus, we’ve been working on a new feature: loading properties from Vault/Secrets cloud services.

It will arrive with Camel 3.16.0, currently on vote and to be released by the end of this week (24/3).

This post introduces the new features and provide some examples.

Secrets Management in Camel

In the past there were many discussions around the possibility of managing secrets in Camel through Vault Services.

The hidden troubles are a lot when we talk about Secrets Management:

Ability to automatically retrieve secrets after a secret rotation has been completed
Writing the function (script, serverless function etc.) to operate the rotation
Being notified once a rotation happens

We choose to start from the beginning: retrieve secrets from a vault service and use them as properties in the Camel configuration.

Supported Services

In 3.16.0 we’re supporting two of the main services available in the cloud space:

AWS Secret Manager
Google Cloud Secret Manager

How it works

The Vault feature works by specifying a particular prefix while using the Properties component.

For example for AWS:

<camelContext>
    <route>
        <from uri="direct:start"/>
        <log message="Username is {{aws:username}}"/>
    </route>
</camelContext>

<camelContext>
    <route>
        <from uri="direct:start"/>
        <log message="Username is {{gcp:username}}"/>
    </route>
</camelContext>

This notation will allow to run the following workflow while starting a camel route:

Connect and authenticate to AWS Secret Manager (or GCP)
Retrieve the value related to the secret named username
Substitute the property with the secret value just returned

For using the particular Properties Function the two requirements are adding the camel-aws-secret-manager JAR for using the AWS one or adding the camel-google-secret-manager JAR for GCP and setting up the credentials to access the cloud service.

Setting up the Properties Function

Each of the Secret management cloud services require different parameters to complete authentication and authorization.

For both the Properties Functions currently available we provide two different approaches:

Environment variables
Main Configuration properties

AWS Secrets Manager

The AWS Secret Manager Properties Function configurations through enviroment variables are the following:

export $CAMEL_VAULT_AWS_USE_DEFAULT_CREDENTIALS_PROVIDER=accessKey
export $CAMEL_VAULT_AWS_SECRET_KEY=secretKey
export $CAMEL_VAULT_AWS_REGION=region

While as Main Configuration properties it is possible to define the credentials through the following:

camel.vault.aws.accessKey = accessKey
camel.vault.aws.secretKey = secretKey
camel.vault.aws.region = region

The above examples are not considering the Default Credentials Provider chain coming from AWS SDK, but the Properties Function can be configured even in that way. This is how to do that through enviroment variables:

export $CAMEL_VAULT_AWS_USE_DEFAULT_CREDENTIALS_PROVIDER=true
export $CAMEL_VAULT_AWS_REGION=region

This could be done even with main configuration properties:

camel.vault.aws.defaultCredentialsProvider = true
camel.vault.aws.region = region

GCP Secret Manager

The GCP Secret Manager Properties Function configurations through enviroment variables are the following:

export $CAMEL_VAULT_GCP_SERVICE_ACCOUNT_KEY=file:////path/to/service.accountkey
export $CAMEL_VAULT_GCP_PROJECT_ID=projectId

While as Main Configuration properties it is possible to define the credentials through the following:

camel.vault.gcp.serviceAccountKey = accessKey
camel.vault.gcp.projectId = secretKey

The above examples are not considering the Default Credentials Provider coming from GCP, but the Properties Function can be configured even in that way. This is how to do that through enviroment variables:

export $CAMEL_VAULT_GCP_USE_DEFAULT_INSTANCE=true
export $CAMEL_VAULT_GCP_PROJECT_ID=projectId

This could be done even with main configuration properties:

camel.vault.gcp.useDefaultInstance = true
camel.vault.aws.projectId = region

Multi fields Secrets

Some of the Secret manager services allow users to create multiple fields in a secret, like for example:

{
  "username": "admin",
  "password": "password123",
  "engine": "postgres",
  "host": "127.0.0.1",
  "port": "3128",
  "dbname": "db"
}

Usually the format of the secret will be a JSON. With the Properties Function related to secrets we can retrieve a single value of the secret and use it. As example:

You’re able to do get single secret value in your route, like for example:

<camelContext>
    <route>
        <from uri="direct:start"/>
        <log message="Username is {{gcp:database/username}}"/>
    </route>
</camelContext>

In this route the property will be replaced by the field username of the value of the secret named database.

Default Values

It is possible to fallback to a default value. Taking back the example above, we could use:

You could specify a default value in case the particular field of secret is not present on GCP Secret Manager:

<camelContext>
    <route>
        <from uri="direct:start"/>
        <log message="Username is {{gcp:database/username:admin}}"/>
    </route>
</camelContext>

And in case something is not working, like authentication fails, secret doesn’t exists or service is down, the value returned will be admin.

Future

In the next Camel version we are planning to work on more Secret Management Services. In particular we want to add two main components to the list:

Azure Key Vault
Hashicorp Vault

Follow the Camel’s development to know more about the work in progress.

Use the Properties Functions in your projects and give us feedback, once the release 3.16.0 will be out (it’s on vote in these days).

Stay tuned!

Camel-AWS-S3 - New Streaming upload feature

19 Apr 2021

In the last weeks I was focused on a particular feature for the Camel AWS S3 component: the streaming upload feature.

In this post I’m going to summarize what it is an how to use it.

Streaming upload

The AWS S3 component had already a multipart upload feature in his producer operations: the main “problem” with it, was the need of knowing the size of the upload ahead of time.

The streaming upload feature coming in Camel 3.10.0 won’t need to know the size before starting the upload.

How it works

Obviously this feature has been implemented on the S3 component producer side.

The idea is to continuously send data to the producer and batching the messages. On the endpoint you’ll have three possible way of stopping the batching:

timeout
buffer size
batch size

Buffer size and batch size will work together, this means that the batch will be completed when the batch size is complete or when the set buffer size has been excedeed.

With the timeout in the picture the batching will be stopped and the upload completed (also) when the timeout will be reached.

S3 Files naming

In the streaming upload producer two different naming strategy are provided:

progressive
random

The progressive one will add a progressive suffix to the uploaded part, while the random one will add a random id as keyname suffix.

If the S3 key name you’ll specify on your endpoint will be “file_upload_part.txt”, during the upload you can expect a list like:

file_upload_part.txt
file_upload_part-1.txt
file_upload_part-2.txt

and so on.

The progressive naming strategy will make you ask how does it work when I stop and restart the route?

Restarting Strategies

The restarting strategies provided in the S3 Streaming upload producer are:

lastPart
override

the lastPart strategy will make sense only in combination with the progressive naming strategy, obviously.

At the time of restarting the route, the producer will check for the S3 keyname prefix in the bucket specified and get the last index uploaded.

The index will be used to start again from the same point.

Sample

This feature is very nice to see in action.

In the camel-examples repository I added an example of the feature with Kafka as consumer.

The example will poll one kafka topic s3.topic.1 and upload batch of 25 messages (or 1 Mb batch) as single file into an s3 bucket (mycamel-1).

In the how to run section of the README it is explained well how to ingest data to your Kafka broker.

Conclusion

The streaming upload feature will be useful in situation where the user don’t know the amount of data he wants to upload to S3, but also when he just wants to ingest data continuously without having to care about the size.

There is probably more work to do, but this can be a feature to introduce even in other storage components we have in Apache Camel.

Older Newer

Software Blog

Building Intelligent Document Processing with Apache Camel: Docling meets LangChain4j

The Challenge: Document Intelligence at Scale

Meet the Components

Camel-Docling: Enterprise Document Conversion

Camel-LangChain4j: AI Orchestration Made Simple

Building a RAG Pipeline with YAML

Architecture Overview

Setting Up the Infrastructure

Configuring the Chat Model

The Main RAG Route

Interactive Q&A API

Future Enhancements

Try It Yourself

Conclusion

Load properties from Vault/Secrets cloud services: introducing Camel Context automatic refresh on secrets updates

Enabling Automatic Camel context reloading after Secrets Refresh

AWS Secrets Manager

Google Secret Manager

Azure Key Vault

A real example

Future

Load properties from Vault/Secrets cloud services: an update

Supported Services

Setting up the Properties Function

AWS Secrets Manager

Hashicorp Vault

Multi fields Secrets and Default value

Versioning

Future

Camel 3.16.0 new feature: Load properties from Vault/Secrets cloud services

Secrets Management in Camel

Supported Services

How it works

Setting up the Properties Function

AWS Secrets Manager

GCP Secret Manager

Multi fields Secrets

Default Values

Future

Camel-AWS-S3 - New Streaming upload feature

Streaming upload

How it works

S3 Files naming

Restarting Strategies

Sample

Conclusion