The Documents plugin provides functionality for generating, persisting, combining, and manipulating documents within the FlowX.AI system.
The plugin is available as a docker image.
Before setting up the plugin, ensure that you have the following dependencies installed and configured:
The plugin comes with pre-filled configuration properties, but you need to set up a few custom environment variables to tailor it to your specific setup. Here are the key configuration steps:
Configure the basic Postgres settings in the values.yaml
file:
The plugin can utilize the Redis component already deployed for the FLOWX.AI engine. Make sure it is configured properly.
Ensure that you have a deployed S3-compatible file storage solution, such as Min.io, which will be used to store document files.
To connect to the identity management platform, set the following environment variables:
SECURITY_OAUTH2_BASE_SERVER_URL
SECURITY_OAUTH2_CLIENT_CLIENT_ID
SECURITY_OAUTH2_REALM
If you want to use HTML templates for documents, set the FLOWX_HTML_TEMPLATES_ENABLED
environment variable to true.
The service uses a Postgres/Oracle database to store data related to document templates and documents. Configure the following details using environment variables:
SPRING_DATASOURCE_URL
: The URL for the Postgres/Oracle database.SPRING_DATASOURCE_USERNAME
: The username for the database connection.SPRING_DATASOURCE_PASSWORD
: The password for the database connection.SPRING_JPA_PROPERTIES_HIBERNATE_DEFAULT_SCHEMA
: Use this property to overwrite the name of the database schema if needed.Ensure that the user, password, connection URL, and database name are correctly configured to avoid startup errors. The datasource is automatically configured using a Liquibase script within the engine, including migration scripts.
Configure the MongoDB database access information by setting the SPRING_DATA_MONGODB_URI
environment variable to the MongoDB database URI.
Set the following values with the corresponding Redis-related values:
SPRING_REDIS_HOST
: The host address of the Redis server.SPRING_REDIS_PASSWORD
: The password for the Redis server, if applicable.REDIS_TTL
: The time-to-live (TTL) value for Redis cache entries.Configuration available starting with 3.4.7 platform version.
FLOWX_CONVERT_DPI
: Sets the DPI (dots per inch) for PDF to JPEG conversion. Higher values result in higher resolution images. (Default value: 150
).Set the following Kafka-related configurations using environment variables:
SPRING_KAFKA_BOOTSTRAP_SERVERS
: The address of the Kafka server.SPRING_KAFKA_CONSUMER_GROUP_ID
: The group ID for Kafka consumers.KAFKA_CONSUMER_THREADS
: The number of Kafka consumer threads to use.KAFKA_AUTH_EXCEPTION_RETRY_INTERVAL
: The interval between retries after a AuthorizationException
is thrown by KafkaConsumer
.KAFKA_MESSAGE_MAX_BYTES
: The maximum size of a message that can be received by the broker from a producer.Each action in the service corresponds to a Kafka event. Configure a separate Kafka topic for each use case.
KAFKA_TOPIC_DOCUMENT_GENERATE_HTML_IN
: This Kafka topic is used for messages related to generating HTML documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GENERATE_HTML_OUT
: This Kafka topic is used for messages related to generating HTML documents (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_GENERATE_PDF_IN
: This Kafka topic is used for the input messages related to generating PDF documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GENERATE_PDF_OUT
: This Kafka topic is used for the output messages related to generating PDF documents, it produces messages with the result of generating a PDF document (the topic on which the engine will expect the reply)KAFKA_TOPIC_FILE_PERSIST_IN
: This Kafka topic is used for the input messages related to persisting files, it receives messages indicating the request to persist a file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_PERSIST_OUT
: This Kafka topic is used for the output messages related to persisting files, it produces messages with the result of persisting a file (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_PERSIST_IN
: This Kafka topic is used for the input messages related to persisting documents, it receives messages indicating the request to persist a document (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_PERSIST_OUT
: This Kafka topic is used for the output messages related to persisting documents, it produces messages with the result of persisting a document (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_SPLIT_IN
: This Kafka topic is used for the input messages related to splitting documents, it receives messages indicating the request to split a document into multiple parts (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_SPLIT_OUT
: This Kafka topic is used for the output messages related to splitting documents, it produces messages with the result of splitting a document (the topic on which the engine will expect the reply)KAFKA_TOPIC_FILE_COMBINE_IN
: This Kafka topic is used for the input messages related to combining files, it receives messages indicating the request to combine multiple files into a single file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_COMBINE_OUT
: This Kafka topic is used for the output messages related to combining files, it produces messages with the result of combining files (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_GET_URLS_IN
: This Kafka topic is used for the input messages related to retrieving URLs for documents, it receives messages indicating the request to retrieve the URLs of documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GET_URLS_OUT
: This Kafka topic is used for the output messages related to retrieving URLs for documents, it produces messages with the result of retrieving the URLs of documents (the topic on which the engine will expect the reply)KAFKA_TOPIC_FILE_DELETE_IN
: This Kafka topic is used for the input messages related to deleting files, it receives messages indicating the request to delete a file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_DELETE_OUT
: This Kafka topic is used for the output messages related to deleting files, it produces messages with the result of deleting a file (the topic on which the engine will expect the reply)KAFKA_TOPIC_OCR_OUT
: This Kafka topic is used for the output messages related to optical character recognition (OCR), it produces messages with the OCR results (the topic on which the engine will expect the reply)KAFKA_TOPIC_OCR_IN
: This Kafka topic is used for the input messages related to optical character recognition (OCR), it receives messages indicating the request to perform OCR on a document (the topic that listens for the request from the engine)Ensure that the Engine is listening to messages on topics with specific patterns. Use the correct outgoing topic names when configuring the documents plugin.
Each of these Kafka topics corresponds to a specific action or functionality within the service, allowing communication and data exchange between different components or services in a decoupled manner.
Depending on your use case, you can choose either a file system or an S3-compatible cloud storage solution for document storage. Configure the file storage solution using the following environment variables:
APPLICATION_FILE_STORAGE_PARTITION_STRATEGY
: Set the partition strategy for file storage. Use NONE
to save documents in minio/amazon-s3
as before, with a bucket for each process instance. Use PROCESS_DATE
to save documents in a single bucket with a subfolder structure, for example: bucket/2022/2022-07-04/process-id-xxxx/customer-id/file.pdf
.APPLICATION_FILE_STORAGE_DELETION_STRATEGY
(default value: delete): This will keep the current behaviour of deleting the temporary files.
Other possible values:
x-amz-bypass-governance-retention:true
, to enable deletion of governed files, in case the s3 configured user for document-plugin, will have the s3:BypassGovernanceRetention
permission.APPLICATION_FILE_STORAGE_S3_SERVER_URL
: The URL of the S3-compatible server.APPLICATION_FILE_STORAGE_S3_ACCESS_KEY
: The access key for the S3-compatible server.APPLICATION_FILE_STORAGE_S3_SECRET_KEY
: The secret key for the S3-compatible server.APPLICATION_FILE_STORAGE_S3_BUCKET_PREFIX
: The prefix to use for S3 bucket names.APPLICATION_FILE_STORAGE_S3_TEMP_BUCKET
: Upon file upload, the initial destination is a sandbox, from which it is subsequently transferred to the designated bucket.Make sure to follow the recommended bucket naming rules when choosing the bucket prefix name.
To control the maximum file size permitted for uploads, configure the SPRING_SERVLET_MULTIPART_MAX_FILE_SIZE
and SPRING_SERVLET_MULTIPART_MAX_REQUEST_SIZE
variables.
The limit is set by default to 50MB:
Set the FLOWX_HTML_TEMPLATES_PDF_FONT_PATHS
config to select the font used for generating documents based on PDF templates.
If you want to use specific fonts in your PDF templates, override the FLOWX_HTML_TEMPLATES_PDF_FONT_PATHS
config. By default, Calibri and DejaVuSans are available fonts.
After making these configurations, the fonts will be available for use within PDF templates.
The following environment variables could be set in order to control log levels:
LOGGING_LEVEL_ROOT
: Controls the log level for root Spring Boot microservice logs.LOGGING_LEVEL_APP
: Controls the log level for application-specific logs.LOGGING_LEVEL_MONGO_DRIVER
: Controls the log level for MongoDB driver logs.Adjust these variables according to your logging requirements.
The Documents plugin provides functionality for generating, persisting, combining, and manipulating documents within the FlowX.AI system.
The plugin is available as a docker image.
Before setting up the plugin, ensure that you have the following dependencies installed and configured:
The plugin comes with pre-filled configuration properties, but you need to set up a few custom environment variables to tailor it to your specific setup. Here are the key configuration steps:
Configure the basic Postgres settings in the values.yaml
file:
The plugin can utilize the Redis component already deployed for the FLOWX.AI engine. Make sure it is configured properly.
Ensure that you have a deployed S3-compatible file storage solution, such as Min.io, which will be used to store document files.
To connect to the identity management platform, set the following environment variables:
SECURITY_OAUTH2_BASE_SERVER_URL
SECURITY_OAUTH2_CLIENT_CLIENT_ID
SECURITY_OAUTH2_REALM
If you want to use HTML templates for documents, set the FLOWX_HTML_TEMPLATES_ENABLED
environment variable to true.
The service uses a Postgres/Oracle database to store data related to document templates and documents. Configure the following details using environment variables:
SPRING_DATASOURCE_URL
: The URL for the Postgres/Oracle database.SPRING_DATASOURCE_USERNAME
: The username for the database connection.SPRING_DATASOURCE_PASSWORD
: The password for the database connection.SPRING_JPA_PROPERTIES_HIBERNATE_DEFAULT_SCHEMA
: Use this property to overwrite the name of the database schema if needed.Ensure that the user, password, connection URL, and database name are correctly configured to avoid startup errors. The datasource is automatically configured using a Liquibase script within the engine, including migration scripts.
Configure the MongoDB database access information by setting the SPRING_DATA_MONGODB_URI
environment variable to the MongoDB database URI.
Set the following values with the corresponding Redis-related values:
SPRING_REDIS_HOST
: The host address of the Redis server.SPRING_REDIS_PASSWORD
: The password for the Redis server, if applicable.REDIS_TTL
: The time-to-live (TTL) value for Redis cache entries.Configuration available starting with 3.4.7 platform version.
FLOWX_CONVERT_DPI
: Sets the DPI (dots per inch) for PDF to JPEG conversion. Higher values result in higher resolution images. (Default value: 150
).Set the following Kafka-related configurations using environment variables:
SPRING_KAFKA_BOOTSTRAP_SERVERS
: The address of the Kafka server.SPRING_KAFKA_CONSUMER_GROUP_ID
: The group ID for Kafka consumers.KAFKA_CONSUMER_THREADS
: The number of Kafka consumer threads to use.KAFKA_AUTH_EXCEPTION_RETRY_INTERVAL
: The interval between retries after a AuthorizationException
is thrown by KafkaConsumer
.KAFKA_MESSAGE_MAX_BYTES
: The maximum size of a message that can be received by the broker from a producer.Each action in the service corresponds to a Kafka event. Configure a separate Kafka topic for each use case.
KAFKA_TOPIC_DOCUMENT_GENERATE_HTML_IN
: This Kafka topic is used for messages related to generating HTML documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GENERATE_HTML_OUT
: This Kafka topic is used for messages related to generating HTML documents (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_GENERATE_PDF_IN
: This Kafka topic is used for the input messages related to generating PDF documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GENERATE_PDF_OUT
: This Kafka topic is used for the output messages related to generating PDF documents, it produces messages with the result of generating a PDF document (the topic on which the engine will expect the reply)KAFKA_TOPIC_FILE_PERSIST_IN
: This Kafka topic is used for the input messages related to persisting files, it receives messages indicating the request to persist a file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_PERSIST_OUT
: This Kafka topic is used for the output messages related to persisting files, it produces messages with the result of persisting a file (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_PERSIST_IN
: This Kafka topic is used for the input messages related to persisting documents, it receives messages indicating the request to persist a document (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_PERSIST_OUT
: This Kafka topic is used for the output messages related to persisting documents, it produces messages with the result of persisting a document (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_SPLIT_IN
: This Kafka topic is used for the input messages related to splitting documents, it receives messages indicating the request to split a document into multiple parts (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_SPLIT_OUT
: This Kafka topic is used for the output messages related to splitting documents, it produces messages with the result of splitting a document (the topic on which the engine will expect the reply)KAFKA_TOPIC_FILE_COMBINE_IN
: This Kafka topic is used for the input messages related to combining files, it receives messages indicating the request to combine multiple files into a single file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_COMBINE_OUT
: This Kafka topic is used for the output messages related to combining files, it produces messages with the result of combining files (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_GET_URLS_IN
: This Kafka topic is used for the input messages related to retrieving URLs for documents, it receives messages indicating the request to retrieve the URLs of documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GET_URLS_OUT
: This Kafka topic is used for the output messages related to retrieving URLs for documents, it produces messages with the result of retrieving the URLs of documents (the topic on which the engine will expect the reply)KAFKA_TOPIC_FILE_DELETE_IN
: This Kafka topic is used for the input messages related to deleting files, it receives messages indicating the request to delete a file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_DELETE_OUT
: This Kafka topic is used for the output messages related to deleting files, it produces messages with the result of deleting a file (the topic on which the engine will expect the reply)KAFKA_TOPIC_OCR_OUT
: This Kafka topic is used for the output messages related to optical character recognition (OCR), it produces messages with the OCR results (the topic on which the engine will expect the reply)KAFKA_TOPIC_OCR_IN
: This Kafka topic is used for the input messages related to optical character recognition (OCR), it receives messages indicating the request to perform OCR on a document (the topic that listens for the request from the engine)Ensure that the Engine is listening to messages on topics with specific patterns. Use the correct outgoing topic names when configuring the documents plugin.
Each of these Kafka topics corresponds to a specific action or functionality within the service, allowing communication and data exchange between different components or services in a decoupled manner.
Depending on your use case, you can choose either a file system or an S3-compatible cloud storage solution for document storage. Configure the file storage solution using the following environment variables:
APPLICATION_FILE_STORAGE_PARTITION_STRATEGY
: Set the partition strategy for file storage. Use NONE
to save documents in minio/amazon-s3
as before, with a bucket for each process instance. Use PROCESS_DATE
to save documents in a single bucket with a subfolder structure, for example: bucket/2022/2022-07-04/process-id-xxxx/customer-id/file.pdf
.APPLICATION_FILE_STORAGE_DELETION_STRATEGY
(default value: delete): This will keep the current behaviour of deleting the temporary files.
Other possible values:
x-amz-bypass-governance-retention:true
, to enable deletion of governed files, in case the s3 configured user for document-plugin, will have the s3:BypassGovernanceRetention
permission.APPLICATION_FILE_STORAGE_S3_SERVER_URL
: The URL of the S3-compatible server.APPLICATION_FILE_STORAGE_S3_ACCESS_KEY
: The access key for the S3-compatible server.APPLICATION_FILE_STORAGE_S3_SECRET_KEY
: The secret key for the S3-compatible server.APPLICATION_FILE_STORAGE_S3_BUCKET_PREFIX
: The prefix to use for S3 bucket names.APPLICATION_FILE_STORAGE_S3_TEMP_BUCKET
: Upon file upload, the initial destination is a sandbox, from which it is subsequently transferred to the designated bucket.Make sure to follow the recommended bucket naming rules when choosing the bucket prefix name.
To control the maximum file size permitted for uploads, configure the SPRING_SERVLET_MULTIPART_MAX_FILE_SIZE
and SPRING_SERVLET_MULTIPART_MAX_REQUEST_SIZE
variables.
The limit is set by default to 50MB:
Set the FLOWX_HTML_TEMPLATES_PDF_FONT_PATHS
config to select the font used for generating documents based on PDF templates.
If you want to use specific fonts in your PDF templates, override the FLOWX_HTML_TEMPLATES_PDF_FONT_PATHS
config. By default, Calibri and DejaVuSans are available fonts.
After making these configurations, the fonts will be available for use within PDF templates.
The following environment variables could be set in order to control log levels:
LOGGING_LEVEL_ROOT
: Controls the log level for root Spring Boot microservice logs.LOGGING_LEVEL_APP
: Controls the log level for application-specific logs.LOGGING_LEVEL_MONGO_DRIVER
: Controls the log level for MongoDB driver logs.Adjust these variables according to your logging requirements.