15 Oct 2025
In the rapidly evolving landscape of AI-powered applications, the ability to process and understand documents has become increasingly crucial. Whether you’re dealing with PDFs, Word documents, or PowerPoint presentations, extracting meaningful insights from unstructured data is a challenge many developers face daily.
In this post, we’ll explore how Apache Camel’s new AI components enable developers to build sophisticated RAG (Retrieval Augmented Generation) pipelines with minimal code. We’ll combine the power of Docling for document conversion with LangChain4j for AI orchestration, all orchestrated through Camel’s YAML DSL.
The Challenge: Document Intelligence at Scale
Companies are drowning in documents. Legal firms process contracts, healthcare providers manage medical records, and financial institutions analyze reports. The traditional approach of manual document review simply doesn’t scale.
So this a possible space where we could apply RAG and Apache Camel. The steps:
- Convert documents from any format to structured text
- Extract key insights and summaries
- Answer questions about document content
- Process documents in real-time as they arrive
This is where the combination of Docling and LangChain4j shines, and Apache Camel provides the perfect integration layer to bring them together.
Meet the Components
Camel-Docling: Enterprise Document Conversion
The camel-docling component integrates IBM’s Docling library, an AI-powered document parser that can handle various formats including PDF, Word, PowerPoint, and more. What makes Docling special is its ability to preserve document structure while converting to clean Markdown, HTML, or JSON.
Key features:
- Multiple Operations: Convert to Markdown, HTML, JSON, or extract structured data
- Flexible Deployment: Works with both CLI and API (docling-serve) modes
- Content Control: Return content directly in the message body or as file paths
- OCR Support: Handle scanned documents with optical character recognition
Camel-LangChain4j: AI Orchestration Made Simple
The camel-langchain4j-chat component provides seamless integration with Large Language Models through the LangChain4j framework. It supports various LLM providers including OpenAI, Ollama, and more.
Perfect for:
- Document analysis and summarization
- Question-answering systems
- Content generation
- RAG implementations
Building a RAG Pipeline with YAML
Let’s walk through a complete example that demonstrates the power of combining these components. Our goal is to create a system that automatically processes documents, analyzes them with AI, and generates comprehensive reports: a classic example.
Architecture Overview
The flow is straightforward:
- Watch a directory for new documents
- Convert documents to Markdown using Docling
- Send the converted content to an LLM for analysis
- Generate a comprehensive analysis report
- Clean up processed files
All of this is defined declaratively in YAML, making it easy to understand and modify.
Setting Up the Infrastructure
First, we need our services running. Thanks to camel infra command, this is pretty simple:
# Start Docling (if camel infra supports it)
$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel infra run docling
# Start Ollama (if camel infra supports it)
$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel infra run ollama
Or we could use docker
# Start Docling-Serve
$ docker run -d -p 5001:5001 --name docling-serve ghcr.io/docling-project/docling-serve:latest
# Start Ollama
$ docker run -d -p 11434:11434 --name ollama ollama/ollama:latest
# Pull orca-mini model
$ docker exec -it ollama ollama pull orca-mini
We could also use docker-compose:
$ docker compose up -d
$ docker exec -it ollama ollama pull orca-mini
Configuring the Chat Model
We use a Groovy script bean to configure our LangChain4j chat model:
- beans:
- name: chatModel
type: "#class:dev.langchain4j.model.ollama.OllamaChatModel"
scriptLanguage: groovy
script: |
import dev.langchain4j.model.ollama.OllamaChatModel
import static java.time.Duration.ofSeconds
return OllamaChatModel.builder()
.baseUrl("{{ollama.base.url}}")
.modelName("{{ollama.model.name}}")
.temperature(0.3)
.timeout(ofSeconds(120))
.build()
Notice how we use property placeholders ({{ollama.base.url}}) which Camel automatically resolves. This makes the configuration flexible and environment-agnostic.
The Main RAG Route
Here’s where the magic happens. The route watches for documents, processes them through Docling, and analyzes them with our LLM:
- route:
id: document-analysis-workflow
from:
uri: file:{{documents.directory}}
parameters:
include: ".*\\.(pdf|docx|pptx|html|md)"
noop: true
idempotent: true
steps:
- log: "Processing document: ${header.CamelFileName}"
# Convert GenericFile to file path
- setBody:
simple: "${body.file.absolutePath}"
# Convert to Markdown
- to:
uri: docling:CONVERT_TO_MARKDOWN
parameters:
useDoclingServe: true
doclingServeUrl: "{{docling.serve.url}}"
contentInBody: true
# Prepare AI prompt
- setBody:
simple: |
You are a helpful document analysis assistant. Please analyze
the following document and provide:
1. A brief summary (2-3 sentences)
2. Key topics and main points
3. Any important findings or conclusions
Document content:
${exchangeProperty.convertedMarkdown}
# Get AI analysis
- to:
uri: langchain4j-chat:analysis
parameters:
chatModel: "#chatModel"
Interactive Q&A API
We also provide an HTTP endpoint for asking questions about documents:
- route:
id: document-qa-api
from:
uri: platform-http:/api/ask
steps:
# Find latest document
# Convert with Docling
# Prepare RAG prompt with user question
# Get answer from LLM
This enables interactive workflows:
$ curl -X POST http://localhost:8080/api/ask \
-d "What are the main topics in this document?"
Future Enhancements
Possible developments could be:
- Vector Storage Integration: Combine with camel-langchain4j-embeddings to store document chunks in vector databases for more sophisticated retrieval.
- Multi-Model Workflows: Use different models for different tasks - fast models for classification, powerful models for analysis.
- Streaming Responses: For long documents, stream LLM responses back to the client as they’re generated.
- Custom Tools: Integrate camel-langchain4j-tools to give the LLM access to external data sources.
Try It Yourself
The complete example is available in the Apache Camel repository under camel-jbang-examples/docling-langchain4j-rag. To run it:
$ jbang -Dcamel.jbang.version=4.16.0-SNAPSHOT camel@apache/camel run \
--fresh \
--dep=camel:docling \
--dep=camel:langchain4j-chat \
--dep=camel:platform-http \
--dep=dev.langchain4j:langchain4j:1.6.0 \
--dep=dev.langchain4j:langchain4j-ollama:1.6.0 \
--properties=application.properties \
docling-langchain4j-rag.yaml
Don’t forget to copy the sample.md into the documents directory!
Watch the logs as your document is processed, analyzed, and cleaned up automatically!
Conclusion
The combination of Apache Camel’s integration capabilities, Docling’s document conversion power, and LangChain4j’s AI orchestration creates a compelling platform for building intelligent document processing systems.
What makes this especially powerful is the declarative nature of the solution. The entire workflow is defined in ~350 lines of readable YAML, making it easy to understand, modify, and extend.
We’d love to hear about what you build with these components. Share your experiences on the Apache Camel mailing list or join us on Zulip chat!
Stay tuned for more examples combining Camel’s growing AI component ecosystem. The future of integration is intelligent, and we’re just getting started.
Happy integrating!
20 Dec 2022
Starting from Camel 3.19.0 we have four cloud services supported for loading properties as secrets:
- AWS Secret Manager
- Google Cloud Secret Manager
- Azure Key Vault
- Hashicorp Vault
One of the problems we faced in the development was related to finding a way to automatically refresh the secret value on the secrets update.
The main players in the cloud game are providing solutions based on their services:
AWS provides multiple ways to be notified about secret updates and secret rotations through AWS Cloudtrail or AWS Cloud events, GCP leverages Google Pubsub to deliver messages with events related to secret,
while Azure provides multiple ways of getting notified about events related to a vault in the Azure Key Vault service, mostly by using Azure Eventgrid as an intermediate service.
Hashicorp Vault as of today doesn’t provide an API to get secrets notification.
Enabling Automatic Camel context reloading after Secrets Refresh
AWS Secrets Manager
The automatic context reloading could be achieved through Camel’s main properties. In particular, the authentication to AWS service (Cloudtrail) could be set through the default credentials provider or through access key/secret key/region credentials. The camel’s main properties are:
camel.vault.aws.refreshEnabled=true
camel.vault.aws.refreshPeriod=60000
camel.vault.aws.secrets=Secret
camel.main.context-reload-enabled = true
where camel.vault.aws.refreshEnabled will enable the automatic context to reload, camel.vault.aws.refreshPeriod is the interval of time between two different checks for update events, and camel.vault.aws.secrets is a regex representing the secrets we want to track for updates.
The property camel.vault.aws.secrets is not mandatory: if not specified the task responsible for checking updates events will take into account the properties with an aws: prefix.
The mechanism behind this feature, for what is related to AWS Secrets Manager, involves the AWS Cloudtrail service. The task will search for Secrets in camel properties associated with AWS prefixes and look for events in the Cloudtrail entries related to them. Once the task will find an update operation it will trigger the context reloading.
At the following URL, we provide a simple example through camel-jbang: AWS Secrets Manager Example
Google Secret Manager
The automatic context reloading could be achieved through Camel’s main properties. In particular, the authentication to Google service (Pubsub) could be set through the default google instance or through the service account key file. The camel’s main properties are:
camel.vault.gcp.projectId= projectId
camel.vault.gcp.refreshEnabled=true
camel.vault.gcp.refreshPeriod=60000
camel.vault.gcp.secrets=hello*
camel.vault.gcp.subscriptionName=subscriptionName
camel.main.context-reload-enabled = true
where camel.vault.gcp.refreshEnabled will enable the automatic context reloading, camel.vault.gcp.refreshPeriod is the interval of time between two different checks for update events, and camel.vault.gcp.secrets is a regex representing the secrets we want to track for updates.
The property camel.vault.gcp.secrets is not mandatory: if not specified the task responsible for checking updates events will take into account the properties with a gcp: prefix.
The camel.vault.gcp.subscriptionName is the subscription name created about the Google PubSub topic associated with the tracked secrets.
This mechanism while making use of the notification system related to Google Secret Manager: through this feature, every secret could be associated with one up to ten Google Pubsub Topics. These topics will receive events related to the life cycle of the secret.
At the following URL, we provide a simple example through camel-jbang: Google Secret Manager Example
Azure Key Vault
The automatic context reloading could be achieved through Camel’s main properties. In particular, the authentication to Azure service (Storage Blob) could be set through client id/client secret/tenant id. The camel’s main properties are:
camel.vault.azure.refreshEnabled=true
camel.vault.azure.refreshPeriod=60000
camel.vault.azure.secrets=Secret
camel.vault.azure.eventhubConnectionString=eventhub_conn_string
camel.vault.azure.blobAccountName=blob_account_name
camel.vault.azure.blobContainerName=blob_container_name
camel.vault.azure.blobAccessKey=blob_access_key
camel.main.context-reload-enabled = true
where camel.vault.azure.refreshEnabled will enable the automatic context to reload, camel.vault.azure.refreshPeriod is the interval of time between two different checks for update events, and camel.vault.azure.secrets is a regex representing the secrets we want to track for updates.
The property camel.vault.azure.eventhubConnectionString is the eventhub connection string to get notifications from, camel.vault.azure.blobAccountName, camel.vault.azure.blobContainerName, and camel.vault.azure.blobAccessKey are the Azure Storage Blob parameters for the checkpoint store needed by Azure Eventhub.
The property camel.vault.azure.secrets is not mandatory: if not specified the task responsible for checking updates events will take into account the properties with an azure: prefix.
At the following URL, we provide a simple example through camel-jbang: Azure Key Vault Example
A real example
While working on the feature I tried to set up a real use-case scenario. In my case, I’ve been trying to update a database administrator password while a running camel integration was querying the database.
This is well explained in this example the reader could run through camel-jbang: MySQL Database Password Refresh Example
Future
In the next Camel development for what concerns the secret updates feature, we would like to provide the ability to select a different kinds of tasks/policies to trigger a context reloading. For example, we would like to support the secret rotations events coming from AWS Services supporting the rotation. This is in our roadmap.
If you find this feature useful and you feel there is something to improve, don’t hesitate to contact us via the usual channels: Camel Support
Thanks for your attention and see you online.
26 Jul 2022
In Camel 3.16.0 we introduced the ability to load properties from vault and use them in the Camel context.
This post aims to show the updates and improvements we’ve done in the last two releases.
Supported Services
In 3.16.0 we’re supporting two of the main services available in the cloud space:
- AWS Secret Manager
- Google Cloud Secret Manager
In 3.19.0, to be released, we’re going to have four services available:
- AWS Secret Manager
- Google Cloud Secret Manager
- Azure Key Vault
- Hashicorp Vault
Setting up the Properties Function
Each of the Secret management cloud services require different parameters to complete authentication and authorization.
For both the Properties Functions currently available we provide two different approaches:
- Environment variables
- Main Configuration properties
You already have the information for AWS and GCP in the old blog post.
Let’s explore Azure Key Vault and Hashicorp Vault.
AWS Secrets Manager
The Azure Key Vault Properties Function configurations through enviroment variables are the following:
export $CAMEL_VAULT_AZURE_TENANT_ID=tenantId
export $CAMEL_VAULT_AZURE_CLIENT_ID=clientId
export $CAMEL_VAULT_AZURE_CLIENT_SECRET=clientSecret
export $CAMEL_VAULT_AZURE_VAULT_NAME=vaultName
While as Main Configuration properties it is possible to define the credentials through the following:
camel.vault.azure.tenantId = accessKey
camel.vault.azure.clientId = clientId
camel.vault.azure.clientSecret = clientSecret
camel.vault.azure.vaultName = vaultName
To recover a secret from azure you might run something like:
<camelContext>
<route>
<from uri="direct:start"/>
<to uri="{{azure:route}}"/>
</route>
</camelContext>
Hashicorp Vault
The Hashicorp Vault Properties Function configurations through enviroment variables are the following:
export $CAMEL_VAULT_HASHICORP_TOKEN=token
export $CAMEL_VAULT_HASHICORP_ENGINE=secretKey
export $CAMEL_VAULT_HASHICORP_HOST=host
export $CAMEL_VAULT_HASHICORP_PORT=port
export $CAMEL_VAULT_HASHICORP_SCHEME=http/https
While as Main Configuration properties it is possible to define the credentials through the following:
camel.vault.hashicorp.token = token
camel.vault.hashicorp.engine = engine
camel.vault.hashicorp.host = host
camel.vault.hashicorp.port = port
camel.vault.hashicorp.scheme = scheme
To recover a secret from Hashicorp Vault you might run something like:
<camelContext>
<route>
<from uri="direct:start"/>
<to uri="{{hashicorp:route}}"/>
</route>
</camelContext>
Multi fields Secrets and Default value
As for AWS Secrets Manager and Google Secrets Manager, the multi fields secrets and default value are both supported by Azure Key Vault and Hashicorp Vault Properties functions.
Versioning
In the next Camel version we are going to release the support for recovering a secret with a particular version. This will be supported by all the vault we currently support in Camel.
In particular you’ll be able to recover a specific version of a secrets with the following syntax.
<camelContext>
<route>
<from uri="direct:start"/>
<log message="Username is {{hashicorp:database/username:admin@2}}"/>
</route>
</camelContext>
In this example we’re going to recover the field username from the secret database, with version “2”. In case the version is not available, we’re going to have a default value of ‘admin’.
Future
We plan to work on the ability to reload the whole context once a secret has been rotated or updated. This is something still in the design phase, but we really would like to see it implemented soon.
Stay tuned for more news!
24 Mar 2022
In the last weeks, together with Claus, we’ve been working on a new feature: loading properties from Vault/Secrets cloud services.
It will arrive with Camel 3.16.0, currently on vote and to be released by the end of this week (24/3).
This post introduces the new features and provide some examples.
Secrets Management in Camel
In the past there were many discussions around the possibility of managing secrets in Camel through Vault Services.
The hidden troubles are a lot when we talk about Secrets Management:
- Ability to automatically retrieve secrets after a secret rotation has been completed
- Writing the function (script, serverless function etc.) to operate the rotation
- Being notified once a rotation happens
We choose to start from the beginning: retrieve secrets from a vault service and use them as properties in the Camel configuration.
Supported Services
In 3.16.0 we’re supporting two of the main services available in the cloud space:
- AWS Secret Manager
- Google Cloud Secret Manager
How it works
The Vault feature works by specifying a particular prefix while using the Properties component.
For example for AWS:
<camelContext>
<route>
<from uri="direct:start"/>
<log message="Username is {{aws:username}}"/>
</route>
</camelContext>
or
<camelContext>
<route>
<from uri="direct:start"/>
<log message="Username is {{gcp:username}}"/>
</route>
</camelContext>
This notation will allow to run the following workflow while starting a camel route:
- Connect and authenticate to AWS Secret Manager (or GCP)
- Retrieve the value related to the secret named
username
- Substitute the property with the secret value just returned
For using the particular Properties Function the two requirements are adding the camel-aws-secret-manager JAR for using the AWS one or
adding the camel-google-secret-manager JAR for GCP and setting up the credentials to access the cloud service.
Setting up the Properties Function
Each of the Secret management cloud services require different parameters to complete authentication and authorization.
For both the Properties Functions currently available we provide two different approaches:
- Environment variables
- Main Configuration properties
AWS Secrets Manager
The AWS Secret Manager Properties Function configurations through enviroment variables are the following:
export $CAMEL_VAULT_AWS_USE_DEFAULT_CREDENTIALS_PROVIDER=accessKey
export $CAMEL_VAULT_AWS_SECRET_KEY=secretKey
export $CAMEL_VAULT_AWS_REGION=region
While as Main Configuration properties it is possible to define the credentials through the following:
camel.vault.aws.accessKey = accessKey
camel.vault.aws.secretKey = secretKey
camel.vault.aws.region = region
The above examples are not considering the Default Credentials Provider chain coming from AWS SDK, but the Properties Function can be configured even in that way. This is how to do that through enviroment variables:
export $CAMEL_VAULT_AWS_USE_DEFAULT_CREDENTIALS_PROVIDER=true
export $CAMEL_VAULT_AWS_REGION=region
This could be done even with main configuration properties:
camel.vault.aws.defaultCredentialsProvider = true
camel.vault.aws.region = region
GCP Secret Manager
The GCP Secret Manager Properties Function configurations through enviroment variables are the following:
export $CAMEL_VAULT_GCP_SERVICE_ACCOUNT_KEY=file:////path/to/service.accountkey
export $CAMEL_VAULT_GCP_PROJECT_ID=projectId
While as Main Configuration properties it is possible to define the credentials through the following:
camel.vault.gcp.serviceAccountKey = accessKey
camel.vault.gcp.projectId = secretKey
The above examples are not considering the Default Credentials Provider coming from GCP, but the Properties Function can be configured even in that way. This is how to do that through enviroment variables:
export $CAMEL_VAULT_GCP_USE_DEFAULT_INSTANCE=true
export $CAMEL_VAULT_GCP_PROJECT_ID=projectId
This could be done even with main configuration properties:
camel.vault.gcp.useDefaultInstance = true
camel.vault.aws.projectId = region
Multi fields Secrets
Some of the Secret manager services allow users to create multiple fields in a secret, like for example:
{
"username": "admin",
"password": "password123",
"engine": "postgres",
"host": "127.0.0.1",
"port": "3128",
"dbname": "db"
}
Usually the format of the secret will be a JSON. With the Properties Function related to secrets we can retrieve a single value of the secret and use it. As example:
You’re able to do get single secret value in your route, like for example:
<camelContext>
<route>
<from uri="direct:start"/>
<log message="Username is {{gcp:database/username}}"/>
</route>
</camelContext>
In this route the property will be replaced by the field username of the value of the secret named database.
Default Values
It is possible to fallback to a default value. Taking back the example above, we could use:
You could specify a default value in case the particular field of secret is not present on GCP Secret Manager:
<camelContext>
<route>
<from uri="direct:start"/>
<log message="Username is {{gcp:database/username:admin}}"/>
</route>
</camelContext>
And in case something is not working, like authentication fails, secret doesn’t exists or service is down, the value returned will be admin.
Future
In the next Camel version we are planning to work on more Secret Management Services. In particular we want to add two main components to the list:
- Azure Key Vault
- Hashicorp Vault
Follow the Camel’s development to know more about the work in progress.
Use the Properties Functions in your projects and give us feedback, once the release 3.16.0 will be out (it’s on vote in these days).
Stay tuned!
19 Apr 2021
In the last weeks I was focused on a particular feature for the Camel AWS S3 component: the streaming upload feature.
In this post I’m going to summarize what it is an how to use it.
Streaming upload
The AWS S3 component had already a multipart upload feature in his producer operations: the main “problem” with it, was the need of knowing the size of the upload ahead of time.
The streaming upload feature coming in Camel 3.10.0 won’t need to know the size before starting the upload.
How it works
Obviously this feature has been implemented on the S3 component producer side.
The idea is to continuously send data to the producer and batching the messages. On the endpoint you’ll have three possible way of stopping the batching:
- timeout
- buffer size
- batch size
Buffer size and batch size will work together, this means that the batch will be completed when the batch size is complete or when the set buffer size has been excedeed.
With the timeout in the picture the batching will be stopped and the upload completed (also) when the timeout will be reached.
S3 Files naming
In the streaming upload producer two different naming strategy are provided:
The progressive one will add a progressive suffix to the uploaded part, while the random one will add a random id as keyname suffix.
If the S3 key name you’ll specify on your endpoint will be “file_upload_part.txt”, during the upload you can expect a list like:
- file_upload_part.txt
- file_upload_part-1.txt
- file_upload_part-2.txt
and so on.
The progressive naming strategy will make you ask how does it work when I stop and restart the route?
Restarting Strategies
The restarting strategies provided in the S3 Streaming upload producer are:
the lastPart strategy will make sense only in combination with the progressive naming strategy, obviously.
At the time of restarting the route, the producer will check for the S3 keyname prefix in the bucket specified and get the last index uploaded.
The index will be used to start again from the same point.
Sample
This feature is very nice to see in action.
In the camel-examples repository I added an example of the feature with Kafka as consumer.
The example will poll one kafka topic s3.topic.1 and upload batch of 25 messages (or 1 Mb batch) as single file into an s3 bucket (mycamel-1).
In the how to run section of the README it is explained well how to ingest data to your Kafka broker.
Conclusion
The streaming upload feature will be useful in situation where the user don’t know the amount of data he wants to upload to S3, but also when he just wants to ingest data continuously without having to care about the size.
There is probably more work to do, but this can be a feature to introduce even in other storage components we have in Apache Camel.