> ## Documentation Index
> Fetch the complete documentation index at: https://docs.flowx.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Data-Sync job setup

> Comprehensive guide for configuring and deploying the Data-Sync Job in your Kubernetes environment

## Overview

The Data-Sync Job synchronizes data across multiple databases to maintain consistency and up-to-date information throughout your system. It operates by connecting to various databases, retrieving data, and synchronizing changes across them. The job logs all actions and can be scheduled to run at regular intervals.

***

## Quick start

```bash theme={"system"}
# 1. Configure your environment variables in a data-sync-job.yaml file
# 2. Apply the configuration
kubectl apply -f data-sync-job.yaml
# 3. Monitor the job status
kubectl get jobs
# 4. Check logs if needed
kubectl logs job/data-sync-job
```

***

## Required environment variables

### Core configuration

| Variable                        | Description                                                     | Example                               |
| ------------------------------- | --------------------------------------------------------------- | ------------------------------------- |
| `FLOWX_SKIPPEDRESOURCESERVICES` | Comma-separated list of services to skip during synchronization | `document-plugin,notification-plugin` |

> ⚠️ **Warning**: Do not include spaces in the `FLOWX_SKIPPEDRESOURCESERVICES` value.

### Database connections

The Data-Sync Job requires connection details for multiple databases. Configure the following sections based on your deployment.

#### MongoDB connections

Each MongoDB-based service requires the following variables:

| Component                | Required Variables                                                                                                                                           |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **CMS**                  | `FLOWX_DATASOURCE_CMS_URI`, `CMS_MONGO_USERNAME`, `CMS_MONGO_PASSWORD`, `CMS_MONGO_DATABASE`                                                                 |
| **Scheduler**            | `FLOWX_DATASOURCE_SCHEDULER_URI`, `SCHEDULER_MONGO_USERNAME`, `SCHEDULER_MONGO_PASSWORD`, `SCHEDULER_MONGO_DATABASE`                                         |
| **Task Manager**         | `FLOWX_DATASOURCE_TASKMANAGER_URI`, `TASKMANAGER_MONGO_USERNAME`, `TASKMANAGER_MONGO_PASSWORD`, `TASKMANAGER_MONGO_DATABASE`                                 |
| **Document Plugin**      | `FLOWX_DATASOURCE_DOCUMENTPLUGIN_URI`, `DOCUMENTPLUGIN_MONGO_USERNAME`, `DOCUMENTPLUGIN_MONGO_PASSWORD`, `DOCUMENTPLUGIN_MONGO_DATABASE`                     |
| **Notification Plugin**  | `FLOWX_DATASOURCE_NOTIFICATIONPLUGIN_URI`, `NOTIFICATIONPLUGIN_MONGO_USERNAME`, `NOTIFICATIONPLUGIN_MONGO_PASSWORD`, `NOTIFICATIONPLUGIN_MONGO_DATABASE`     |
| **App Runtime**          | `FLOWX_DATASOURCE_APPRUNTIME_URI`, `APPRUNTIME_MONGO_USERNAME`, `APPRUNTIME_MONGO_PASSWORD`, `APPRUNTIME_MONGO_DATABASE`                                     |
| **Integration Designer** | `FLOWX_DATASOURCE_INTEGRATIONDESIGNER_URI`, `INTEGRATIONDESIGNER_MONGO_USERNAME`, `INTEGRATIONDESIGNER_MONGO_PASSWORD`, `INTEGRATIONDESIGNER_MONGO_DATABASE` |
| **Admin**                | `FLOWX_DATASOURCE_ADMIN_URI`, `ADMIN_MONGO_USERNAME`, `ADMIN_MONGO_PASSWORD`, `ADMIN_MONGO_DATABASE`                                                         |

##### MongoDB URI format

```
mongodb://${USERNAME}:${PASSWORD}@mongodb-0.mongodb-headless,mongodb-1.mongodb-headless,mongodb-arbiter-0.mongodb-arbiter-headless:27017/${DATABASE}
```

#### PostgreSQL connections

| Component                 | Required Variables                                                                                                                                               |
| ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Process Engine**        | `FLOWX_DATASOURCE_ENGINE_URL`, `FLOWX_DATASOURCE_ENGINE_USERNAME`, `FLOWX_DATASOURCE_ENGINE_PASSWORD`, `FLOWX_DATASOURCE_ENGINE_DRIVERCLASSNAME`                 |
| **Application Manager**   | `FLOWX_DATASOURCE_APPMANAGER_URL`, `FLOWX_DATASOURCE_APPMANAGER_USERNAME`, `FLOWX_DATASOURCE_APPMANAGER_PASSWORD`, `FLOWX_DATASOURCE_APPMANAGER_DRIVERCLASSNAME` |
| **Authentication System** | `FLOWX_DATASOURCE_AUTHSYSTEM_URL`, `FLOWX_DATASOURCE_AUTHSYSTEM_USERNAME`, `FLOWX_DATASOURCE_AUTHSYSTEM_PASSWORD`                                                |
| **Organization Manager**  | `FLOWX_DATASOURCE_ORGMANAGER_URL`, `FLOWX_DATASOURCE_ORGMANAGER_USERNAME`, `FLOWX_DATASOURCE_ORGMANAGER_PASSWORD`, `FLOWX_DATASOURCE_ORGMANAGER_DRIVERCLASSNAME` |
|                           |                                                                                                                                                                  |

### OpenID provider configuration

Starting with 5.9.0, data-sync provisions FlowX-managed end-user groups and runtime roles in Keycloak as part of the [runtime authorization](./access-management/runtime-authorization) rollout. Provide admin credentials so the job can call the Keycloak admin API.

| Environment Variable      | Description                                                                                                   | Default Value |
| ------------------------- | ------------------------------------------------------------------------------------------------------------- | ------------- |
| `FLOWX_OPENID_PROVIDER`   | OpenID provider type. Set to `keycloak` (the only supported provider).                                        | `-`           |
| `FLOWX_OPENID_SERVER_URL` | Base URL of the Keycloak server, including the `/auth/` path (for example, `https://auth.example.com/auth/`). | `-`           |
| `FLOWX_OPENID_USER`       | Username of a Keycloak admin account that can manage groups and roles in the FlowX realm.                     | `-`           |
| `FLOWX_OPENID_PASSWORD`   | Password for the admin account above. Store in a Kubernetes Secret.                                           | `-`           |
| `FLOWX_OPENID_CLIENT_ID`  | Keycloak admin client used to obtain the admin token.                                                         | `admin-cli`   |

##### Driver class names

* PostgreSQL: `org.postgresql.Driver`
* Oracle: `oracle.jdbc.OracleDriver`

### Additional configuration

| Variable                                        | Description                                                 |
| ----------------------------------------------- | ----------------------------------------------------------- |
| `SPRING_JPA_DATABASE`                           | Database type for Spring JPA (e.g., `postgresql`, `oracle`) |
| `SPRING_JPA_PROPERTIES_HIBERNATE_DEFAULTSCHEMA` | Default schema for Hibernate                                |
| `LOGGING_CONFIG_FILE`                           | Path to logging configuration file                          |

***

## Service to database mapping

Each service in your environment corresponds to specific database datasources:

| Service                  | Datasources              |
| ------------------------ | ------------------------ |
| `scheduler-core`         | scheduler                |
| `cms-core`               | cms                      |
| `task-management-plugin` | task-manager             |
| `document-plugin`        | document-plugin          |
| `notification-plugin`    | notification-plugin      |
| `runtime-manager`        | app-runtime, app-manager |
| `integration-designer`   | integration-designer     |
| `admin`                  | admin, engine            |
| `process-engine`         | engine                   |
| `application-manager`    | app-manager              |
| `authorization-system`   | auth-system              |

***

## Sample configuration

```yaml theme={"system"}
apiVersion: batch/v1
kind: Job
metadata:
  name: data-sync-job
spec:
  template:
    spec:
      containers:
      - name: data-sync
        image: your-registry/data-sync:latest
        env:
        - name: FLOWX_SKIPPEDRESOURCESERVICES
          value: "document-plugin,notification-plugin"
        # MongoDB connections
        - name: FLOWX_DATASOURCE_CMS_URI
          value: "mongodb://${CMS_MONGO_USERNAME}:${CMS_MONGO_PASSWORD}@mongodb-0.mongodb-headless:27017/${CMS_MONGO_DATABASE}"
        # Add all other required environment variables
      restartPolicy: Never
  backoffLimit: 3
```

***

## SpiceDB configuration

| Environment Variable  | Description                  | Default Value |
| --------------------- | ---------------------------- | ------------- |
| `FLOWX_SPICEDB_HOST`  | SpiceDB server hostname      | `spicedb`     |
| `FLOWX_SPICEDB_PORT`  | SpiceDB server port          | `50051`       |
| `FLOWX_SPICEDB_TOKEN` | SpiceDB authentication token | `-`           |

***

## Best practices

1. Store sensitive credentials in Kubernetes Secrets and reference them in your deployment
2. Include the Data-Sync Job in your CI/CD pipeline for automated deployment
3. Schedule regular runs using a Kubernetes CronJob for periodic synchronization
4. Monitor job execution and set up alerts for failures

***

## Troubleshooting

<AccordionGroup>
  <Accordion title="Database connection failures">
    **Symptoms:** The Data-Sync Job fails to start or exits with database connection errors.

    **Solutions:**

    1. Verify MongoDB and PostgreSQL connection strings are correctly formatted
    2. Check that database credentials are correct and the user has appropriate permissions
    3. Ensure network connectivity between the job pod and database services
    4. For MongoDB, confirm the replica set is healthy and reachable
    5. For PostgreSQL, verify the JDBC URL format and driver class name
  </Accordion>

  <Accordion title="Sync failures">
    **Symptoms:** Data synchronization completes partially or fails.

    **Note:** The Data-Sync Job does **not** use Kafka. It is a batch job that connects directly to each service's database to synchronize data.

    **Solutions:**

    1. Check job logs for specific error messages: `kubectl logs job/data-sync-job`
    2. Verify all database connection strings are correct and reachable from the job pod
    3. Ensure database users have the required read/write permissions
    4. Confirm that services not installed are listed in `FLOWX_SKIPPEDRESOURCESERVICES`
  </Accordion>

  <Accordion title="Data inconsistencies after sync">
    **Symptoms:** Data across services appears out of date or mismatched after the job completes.

    **Solutions:**

    1. Verify that the correct services are being synced and none are accidentally listed in `FLOWX_SKIPPEDRESOURCESERVICES`
    2. Check the service-to-database mapping to ensure each service points to the right datasource
    3. Re-run the Data-Sync Job and monitor logs for partial failures
    4. Confirm that no concurrent writes occurred during the sync window
  </Accordion>

  <Accordion title="Missing required variables">
    **Symptoms:** Job fails immediately on startup with configuration errors.

    **Solutions:**

    1. Ensure all required environment variables are set for each database connection
    2. Check for typos in environment variable names
    3. If a service is not installed, add it to `FLOWX_SKIPPEDRESOURCESERVICES` instead of leaving its variables unconfigured
  </Accordion>
</AccordionGroup>

***

## Related resources

<CardGroup cols={2}>
  <Card title="Redis Configuration" icon="database" href="./redis-configuration">
    Complete Redis setup including Sentinel and Cluster modes
  </Card>

  <Card title="Kafka Authentication" icon="lock" href="./kafka-authentication-config">
    Configure Kafka security and authentication
  </Card>

  <Card title="IAM Configuration" icon="key" href="./access-management/configuring-an-iam-solution">
    Identity and access management setup
  </Card>
</CardGroup>
