Documents plugin
The Documents plugin provides functionality for generating, persisting, combining, and manipulating documents within the FlowX.AI system.
The plugin is available as a docker image.
Dependencies
Before setting up the plugin, ensure that you have the following dependencies installed and configured:
- PostgreSQL Database: You will need a PostgreSQL database to store data related to document templates and documents.
- MongoDB Database: MongoDB is required for the HTML templates feature of the plugin.
- Kafka: Establish a connection to the Kafka instance used by the FLOWX.AI engine.
- Redis: Set up a Redis instance for caching purposes.
- S3-Compatible File Storage Solution: Deploy an S3-compatible file storage solution, such as Min.io, to store document files.
Configuration
The plugin comes with pre-filled configuration properties, but you need to set up a few custom environment variables to tailor it to your specific setup. Here are the key configuration steps:
Postgres database
Configure the basic Postgres settings in the values.yaml
file:
documentdb:
existingSecret: {{secretName}}
metrics:
enabled: true
service:
annotations:
prometheus.io/port: {{phrometeus port}}
prometheus.io/scrape: "true"
type: ClusterIP
serviceMonitor:
additionalLabels:
release: prometheus-operator
enabled: true
interval: 30s
scrapeTimeout: 10s
persistence:
enabled: true
size: 4Gi
postgresqlDatabase: document
postgresqlUsername: postgres
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 200m
memory: 256Mi
service:
annotations:
fabric8.io/expose: "false"
Redis server
The plugin can utilize the Redis component already deployed for the FLOWX.AI engine. Make sure it is configured properly.
Document storage
Ensure that you have a deployed S3-compatible file storage solution, such as Min.io, which will be used to store document files.
Authorization configuration
To connect to the identity management platform, set the following environment variables:
SECURITY_OAUTH2_BASE_SERVER_URL
SECURITY_OAUTH2_CLIENT_CLIENT_ID
SECURITY_OAUTH2_REALM
Enable HTML template types
If you want to use HTML templates for documents, set the FLOWX_HTML_TEMPLATES_ENABLED
environment variable to true.
Datasource configuration
The service uses a Postgres/Oracle database to store data related to document templates and documents. Configure the following details using environment variables:
SPRING_DATASOURCE_URL
: The URL for the Postgres/Oracle database.SPRING_DATASOURCE_USERNAME
: The username for the database connection.SPRING_DATASOURCE_PASSWORD
: The password for the database connection.SPRING_JPA_PROPERTIES_HIBERNATE_DEFAULT_SCHEMA
: Use this property to overwrite the name of the database schema if needed.
Ensure that the user, password, connection URL, and database name are correctly configured to avoid startup errors. The datasource is automatically configured using a Liquibase script within the engine, including migration scripts.
MongoDB configuration
Configure the MongoDB database access information by setting the SPRING_DATA_MONGODB_URI
environment variable to the MongoDB database URI.
Redis configuration
Set the following values with the corresponding Redis-related values:
SPRING_REDIS_HOST
: The host address of the Redis server.SPRING_REDIS_PASSWORD
: The password for the Redis server, if applicable.REDIS_TTL
: The time-to-live (TTL) value for Redis cache entries.
Conversion
Configuration available starting with 3.4.7 platform version.
FLOWX_CONVERT_DPI
: Sets the DPI (dots per inch) for PDF to JPEG conversion. Higher values result in higher resolution images. (Default value:150
).
Kafka configuration
Set the following Kafka-related configurations using environment variables:
SPRING_KAFKA_BOOTSTRAP_SERVERS
: The address of the Kafka server.SPRING_KAFKA_CONSUMER_GROUP_ID
: The group ID for Kafka consumers.KAFKA_CONSUMER_THREADS
: The number of Kafka consumer threads to use.KAFKA_AUTH_EXCEPTION_RETRY_INTERVAL
: The interval between retries after aAuthorizationException
is thrown byKafkaConsumer
.KAFKA_MESSAGE_MAX_BYTES
: The maximum size of a message that can be received by the broker from a producer.
Each action in the service corresponds to a Kafka event. Configure a separate Kafka topic for each use case.
Generate
KAFKA_TOPIC_DOCUMENT_GENERATE_HTML_IN
: This Kafka topic is used for messages related to generating HTML documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GENERATE_HTML_OUT
: This Kafka topic is used for messages related to generating HTML documents (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_GENERATE_PDF_IN
: This Kafka topic is used for the input messages related to generating PDF documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GENERATE_PDF_OUT
: This Kafka topic is used for the output messages related to generating PDF documents, it produces messages with the result of generating a PDF document (the topic on which the engine will expect the reply)
Persist (uploading a file/document)
KAFKA_TOPIC_FILE_PERSIST_IN
: This Kafka topic is used for the input messages related to persisting files, it receives messages indicating the request to persist a file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_PERSIST_OUT
: This Kafka topic is used for the output messages related to persisting files, it produces messages with the result of persisting a file (the topic on which the engine will expect the reply)KAFKA_TOPIC_DOCUMENT_PERSIST_IN
: This Kafka topic is used for the input messages related to persisting documents, it receives messages indicating the request to persist a document (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_PERSIST_OUT
: This Kafka topic is used for the output messages related to persisting documents, it produces messages with the result of persisting a document (the topic that listens for the request from the engine)
Split
KAFKA_TOPIC_DOCUMENT_SPLIT_IN
: This Kafka topic is used for the input messages related to splitting documents, it receives messages indicating the request to split a document into multiple parts (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_SPLIT_OUT
: This Kafka topic is used for the output messages related to splitting documents, it produces messages with the result of splitting a document (the topic on which the engine will expect the reply)
Combine
KAFKA_TOPIC_FILE_COMBINE_IN
: This Kafka topic is used for the input messages related to combining files, it receives messages indicating the request to combine multiple files into a single file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_COMBINE_OUT
: This Kafka topic is used for the output messages related to combining files, it produces messages with the result of combining files (the topic on which the engine will expect the reply)
Get
KAFKA_TOPIC_DOCUMENT_GET_URLS_IN
: This Kafka topic is used for the input messages related to retrieving URLs for documents, it receives messages indicating the request to retrieve the URLs of documents (the topic that listens for the request from the engine)KAFKA_TOPIC_DOCUMENT_GET_URLS_OUT
: This Kafka topic is used for the output messages related to retrieving URLs for documents, it produces messages with the result of retrieving the URLs of documents (the topic on which the engine will expect the reply)
Delete
KAFKA_TOPIC_FILE_DELETE_IN
: This Kafka topic is used for the input messages related to deleting files, it receives messages indicating the request to delete a file (the topic that listens for the request from the engine)KAFKA_TOPIC_FILE_DELETE_OUT
: This Kafka topic is used for the output messages related to deleting files, it produces messages with the result of deleting a file (the topic on which the engine will expect the reply)
OCR
KAFKA_TOPIC_OCR_OUT
: This Kafka topic is used for the output messages related to optical character recognition (OCR), it produces messages with the OCR results (the topic on which the engine will expect the reply)KAFKA_TOPIC_OCR_IN
: This Kafka topic is used for the input messages related to optical character recognition (OCR), it receives messages indicating the request to perform OCR on a document (the topic that listens for the request from the engine)
Ensure that the Engine is listening to messages on topics with specific patterns. Use the correct outgoing topic names when configuring the documents plugin.
Each of these Kafka topics corresponds to a specific action or functionality within the service, allowing communication and data exchange between different components or services in a decoupled manner.
File storage configuration
Depending on your use case, you can choose either a file system or an S3-compatible cloud storage solution for document storage. Configure the file storage solution using the following environment variables:
APPLICATION_FILE_STORAGE_PARTITION_STRATEGY
: Set the partition strategy for file storage. UseNONE
to save documents inminio/amazon-s3
as before, with a bucket for each process instance. UsePROCESS_DATE
to save documents in a single bucket with a subfolder structure, for example:bucket/2022/2022-07-04/process-id-xxxx/customer-id/file.pdf
.APPLICATION_FILE_STORAGE_DELETION_STRATEGY
(default value: delete): This will keep the current behaviour of deleting the temporary files. Other possible values:- disabled: This will disable entirely the deletion of temporary files from the temporary bucket, and the responsibility to delete and clean up the bucket will move in the ownership of the admins of the implementing project.
- deleteBypassingGovernanceRetention: This will still delete the temporary files and further more will add in the delete request the header:
x-amz-bypass-governance-retention:true
, to enable deletion of governed files, in case the s3 configured user for document-plugin, will have thes3:BypassGovernanceRetention
permission.
APPLICATION_FILE_STORAGE_S3_SERVER_URL
: The URL of the S3-compatible server.APPLICATION_FILE_STORAGE_S3_ACCESS_KEY
: The access key for the S3-compatible server.APPLICATION_FILE_STORAGE_S3_SECRET_KEY
: The secret key for the S3-compatible server.APPLICATION_FILE_STORAGE_S3_BUCKET_PREFIX
: The prefix to use for S3 bucket names.APPLICATION_FILE_STORAGE_S3_TEMP_BUCKET
: Upon file upload, the initial destination is a sandbox, from which it is subsequently transferred to the designated bucket.
Make sure to follow the recommended bucket naming rules when choosing the bucket prefix name.
Setting maximum file size
To control the maximum file size permitted for uploads, configure the SPRING_SERVLET_MULTIPART_MAX_FILE_SIZE
and SPRING_SERVLET_MULTIPART_MAX_REQUEST_SIZE
variables.
The limit is set by default to 50MB:
spring:
servlet:
contextPath: /
multipart:
max-file-size: ${MULTIPART_MAX_FILE_SIZE:50MB} #increase the multipart file size on the request
max-request-size: ${MULTIPART_MAX_FILE_SIZE:50MB} #increase the request size
Custom font path for PDF templates
Set the FLOWX_HTML_TEMPLATES_PDF_FONT_PATHS
config to select the font used for generating documents based on PDF templates.
Custom font paths for PDF templates
If you want to use specific fonts in your PDF templates, override the FLOWX_HTML_TEMPLATES_PDF_FONT_PATHS
config. By default, Calibri and DejaVuSans are available fonts.
After making these configurations, the fonts will be available for use within PDF templates.
Logging
The following environment variables could be set in order to control log levels:
LOGGING_LEVEL_ROOT
: Controls the log level for root Spring Boot microservice logs.LOGGING_LEVEL_APP
: Controls the log level for application-specific logs.LOGGING_LEVEL_MONGO_DRIVER
: Controls the log level for MongoDB driver logs.
Adjust these variables according to your logging requirements.
Was this page helpful?