The OCR plugin is a docker image that can be deployed using the following infrastructure prerequisites.
ocr-plugin 1.X
it no longer requires RabbitMQ.The following environment from previous releases must be removed in order to use OCR plugin: CELERY_BROKER_URL
.ocr-plugin
helm chart with custom values file.
Most important sections are these, but more can be extracted from helm chart.
Environment Variable | Definition | Default Value | Example |
---|---|---|---|
ENABLE_KAFKA_SASL | Indicates whether Kafka SASL authentication is enabled | False | - |
KAFKA_ADDRESS | The address of the Kafka bootstrap server in the format <hostname>:<port> | - | kafka-server1:9092 |
KAFKA_CONSUME_SCHEDULE | The interval (in seconds) at which Kafka messages are consumed | 30 | - |
KAFKA_INPUT_TOPIC | The Kafka topic from which input messages are consumed | - | - |
KAFKA_OCR_CONSUMER_GROUPID | The consumer group ID for the OCR Kafka consumer | ocr_group | - |
KAFKA_CONSUMER_AUTO_COMMIT | Determines whether Kafka consumer commits offsets automatically | True | - |
KAFKA_CONSUMER_AUTO_COMMIT_INTERVAL | The interval (in milliseconds) at which Kafka consumer commits offsets automatically | 1000 | - |
KAFKA_CONSUMER_TIMEOUT | The timeout (in milliseconds) for Kafka consumer operations | 28000 | - |
KAFKA_CONSUMER_MAX_POLL_INTERVAL | The maximum interval (in milliseconds) between consecutive polls for Kafka consume | 25000 | - |
KAFKA_CONSUMER_AUTO_OFFSET_RESET | The strategy for resetting the offset when no initial offset is available or if the current offset is invalid | earliest | - |
KAFKA_OUTPUT_TOPIC | The Kafka topic to which output messages are sent | - | - |
Environment Variable | Definition | Default Value | Example |
---|---|---|---|
OAUTH_CLIENT_ID | The client ID for OAuth authentication | - | your_client_id |
OAUTH_CLIENT_SECRET | The client secret for OAuth authentication | - | your_client_secret |
OAUTH_TOKEN_ENDPOINT_URI | The URI of the token endpoint for OAuth authentication | - | https://oauth.example.com/token |
Environment Variable | Definition | Default Value | Example |
---|---|---|---|
STORAGE_S3_HOST | The host address of the S3 storage service | - | minio:9000 , https://s3.eu-west-1.amazonaws.com/ |
STORAGE_S3_SECURE_CONNECTION | Indicates whether to use a secure connection (HTTPS) for S3 storage | False | |
STORAGE_S3_LOCATION | The location of the S3 storage service | - | eu-west-1 |
STORAGE_S3_OCR_SCANS_BUCKET | The name of the S3 bucket for storing OCR scans | - | pdf-scans |
STORAGE_S3_OCR_SIGNATURE_BUCKET | The name of the S3 bucket for storing OCR signatures | - | extracted-signatures |
STORAGE_S3_OCR_SIGNATURE_FILENAME | The filename pattern for extracted OCR signatures | - | extracted_signature_{}.png |
STORAGE_S3_ACCESS_KEY | The access key for connecting to the S3 storage service | - | |
STORAGE_S3_SECRET_KEY | The secret key for connecting to the S3 storage service | - |
Environment Variable | Definition | Default Value |
---|---|---|
ENABLE_PERFORMANCE_PAYLOAD | When set to true, the response payload will contain performance metrics related to various stages of the process. | true |
Environment Variable | Definition | Default Value |
---|---|---|
REQUESTS_CA_BUNDLE | The path to the certificate bundle file used for secure requests | 5 |
CERT_REQUESTS | If no activity has occurred for a certain number of seconds, an attempt will be made to refresh the workers | 'CERT_REQUIRED' |
Environment Variable | Definition | Default Value |
---|---|---|
OCR_WORKER_COUNT | Number of workers | 5 |
OCR_WORK_QUEUE_TIMEOUT | If no activity has occurred for a certain number of seconds, an attempt will be made to refresh the workers | 10 |
OCR_WORK_QUEUE_TIMEOUT
seconds, the application will verify whether any workers have become unresponsive and need to be restarted.If none of the workers have died, it means they are likely blocked in some process. In this case, the application will terminate all the workers and shut down itself, hoping that the container will be restarted.Environment Variable | Definition | Default Value |
---|---|---|
OCR_SIGNATURE_MAX_RATIO | This variable sets the maximum acceptable aspect ratio for a signed scanned document (the OCR plugin will recognize a signature only if the document ratio is greater than or equal to the specified minimum ratio) | 1.43 |
OCR_SIGNATURE_MIN_RATIO | This variable sets the minimum acceptable aspect ratio for a signed scanned document (in this context, the OCR plugin will consider a detected signature only if the document aspect ratio is less than or equal to the specified maximum ratio) | 1.39 |