Skip to main content
Version: 3.4.x

Reporting setup

Introductionโ€‹

The Reporting Setup Guide assists in configuring the reporting plugin, relying on specific dependencies and configurations.

Dependenciesโ€‹

The reporting plugin, available as a Docker image, requires the following dependencies:

  • PostgreSQL: Dedicated instance for reporting data storage.
  • Reporting-plugin Helm Chart:
    • Utilizes a Spark Application to extract data from the FLOWX.AI Engine database and populate the Reporting plugin database.
    • Utilizes Spark Operator (more info here).
  • Superset:
    • Requires a dedicated PostgreSQL database for its operation.
    • Utilizes Redis for efficient caching.
    • Exposes its user interface via an ingress.

Reporting plugin helm chart configurationโ€‹

Configuring the reporting plugin involves several steps:

Installation of Spark Operatorโ€‹

  1. Install the Spark Operator using Helm:
helm install local-spark-release spark-operator/spark-operator \
--namespace spark-operator --create-namespace \
--set webhook.enable=true \
--set logLevel=6
  1. Apply RBAC configurations:
kubectl apply -f spark-rbac.yaml
  1. Build the reporting image:
docker build ...
  1. Update the reporting-image URL in the spark-app.yml file.

  2. Configure the correct database ENV variables in the spark-app.yml file (check them in the above examples with/without webhook).

  3. Deploy the application:

kubectl apply -f operator/spark-app.yaml

Spark Operator deployment optionsโ€‹

Without webhookโ€‹

For deployments without a webhook, manage secrets and environmental variables for security:

sparkApplication: #Defines the Spark application configuration.
enabled: "true" #Indicates that the Spark application is enabled for deployment.
schedule: "@every 5m" #A cronJob that should run at every 5 minutes.
driver: # This section configures the driver component of the Spark application.
envVars: #Environment variables for driver setup.
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
ENGINE_DATABASE_PASSWORD: "password"
REPORTING_DATABASE_PASSWORD: "password"
executor: #This section configures the executor component of the Spark application.
envVars: #Environment variables for executor setup.
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
ENGINE_DATABASE_PASSWORD: "password"
REPORTING_DATABASE_PASSWORD: "password"
info

Note: Passwords are currently set as plain strings, which is not secure practice in a production environment.

With webhookโ€‹

When using the webhook, employ environmental variables with secrets for a balanced security approach:

sparkApplication:
enabled: "true"
schedule: "@every 5m"
driver:
env: #Environment variables for driver setup with secrets.
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets: #Secrets retrieved from a generic source.
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password
executor:
env: #Environment variables for executor setup with secrets.
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets: #Secrets retrieved from a generic source.
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password
info

In Kubernetes-based Spark deployments managed by the Spark Operator, you can define the sparkApplication configuration to customize the behavior, resources, and environment for both the driver and executor components of Spark jobs. The driver section allows fine-tuning of parameters specifically pertinent to the driver part of the Spark application.

Below are the configurable values within the chart values.yml file (with webhook):

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: ScheduledSparkApplication
metadata:
name: reporting-plugin-spark-app
namespace: dev
labels:
app.kubernetes.io/component: reporting
app.kubernetes.io/instance: reporting-plugin
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: reporting-plugin
app.kubernetes.io/release: 0.0.1-FLOWXRELEASE
app.kubernetes.io/version: 0.0.1-FLOWXVERSION
helm.sh/chart: reporting-plugin-0.1.1-PR-9-4-20231122153650-e
spec:
schedule: '@every 5m'
concurrencyPolicy: Forbid
template:
type: Python
pythonVersion: "3"
mode: cluster
image: eu.gcr.io/prj-cicd-d-flowxai-jx-6401/reporting-plugin:0.1.1-PR-9-4-20231122153650-eb6c
imagePullPolicy: IfNotPresent
mainApplicationFile: local:///opt/spark/work-dir/main.py
sparkVersion: "3.1.1"
restartPolicy:
type: Never
onFailureRetries: 0
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: 1200m
memory: 512m
labels:
version: 3.1.1
serviceAccount: spark
env:
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
ENGINE_DATABASE_PASSWORD: "password"
REPORTING_DATABASE_PASSWORD: "password"
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets: #Secrets retrieved from a generic source.
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password
executor:
cores: 1
instances: 3
memory: 512m
labels:
version: 3.1.1
env: #Environment variables for executor setup with secrets.
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres # To set the type of engine database, can be also changed to oracle
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets: #Secrets retrieved from a generic source.
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password

Superset configurationโ€‹

Detailed Superset Configuration Guide:

ยปSuperset configuration

Refer to Superset Documentation for in-depth information:

ยปSuperset documentation

Post-installation stepsโ€‹

After installation, perform the following essential configurations:

Datasource configurationโ€‹

For document-related data storage, configure these environment variables:

  • SPRING_DATASOURCE_URL
  • SPRING_DATASOURCE_USERNAME
  • SPRING_DATASOURCE_PASSWORD

Ensure accurate details to prevent startup errors. The Liquibase script manages schema and migrations.

Redis configurationโ€‹

The following values should be set with the corresponding Redis-related values:

  • SPRING_REDIS_HOST
  • SPRING_REDIS_PORT

Keycloak configurationโ€‹

To implement alternative user authentication:

  • Override AUTH_TYPE in your superset.yml configuration file:
    • Set AUTH_TYPE: AUTH_OID
  • Provide the reference to your openid-connect realm:
    • OIDC_OPENID_REALM: 'flowx'

With this configuration, the login page changes to a prompt where the user can select the desired OpenID provider.

Extend the security managerโ€‹

Firstly, you will want to make sure that flask stops using flask-openid and starts using flask-oidc instead.

To do so, you will need to create your own security manager that configures flask-oidc as its authentication provider.

extraSecrets:
keycloak_security_manager.py: |
from flask_appbuilder.security.manager import AUTH_OID
from superset.security import SupersetSecurityManager
from flask_oidc import OpenIDConnect

To enable OpenID in Superset, you would previously have had to set the authentication type to AUTH_OID.

The security manager still executes all the behavior of the super class, but overrides the OID attribute with the OpenIDConnect object.

Further, it replaces the default OpenID authentication view with a custom one:

    from flask_appbuilder.security.views import AuthOIDView
from flask_login import login_user
from urllib.parse import quote
from flask_appbuilder.views import expose
from flask import request, redirect

class AuthOIDCView(AuthOIDView):
@expose('/login/', methods=['GET', 'POST'])
def login(self, flag=True):
sm = self.appbuilder.sm
oidc = sm.oid
superset_roles = ["Admin", "Alpha", "Gamma", "Public", "granter", "sql_lab"]
default_role = "Admin"
@self.appbuilder.sm.oid.require_login
def handle_login():
user = sm.auth_user_oid(oidc.user_getfield('email'))
if user is None:
info = oidc.user_getinfo(['preferred_username', 'given_name', 'family_name', 'email', 'roles'])
roles = [role for role in superset_roles if role in info.get('roles', [])]
roles += [default_role, ] if not roles else []
user = sm.add_user(info.get('preferred_username'), info.get('given_name', ''), info.get('family_name', ''),
info.get('email'), [sm.find_role(role) for role in roles])
login_user(user, remember=False)
return redirect(self.appbuilder.get_url_for_index)
return handle_login()
@expose('/logout/', methods=['GET', 'POST'])
def logout(self):
oidc = self.appbuilder.sm.oid
oidc.logout()
super(AuthOIDCView, self).logout()
redirect_url = request.url_root.strip('/')
# redirect_url = request.url_root.strip('/') + self.appbuilder.get_url_for_login
return redirect(
oidc.client_secrets.get('issuer') + '/protocol/openid-connect/logout?redirect_uri=' + quote(redirect_url))

On authentication, the user is redirected back to Superset.

Configure Superset authenticationโ€‹

Finally, we need to add some parameters to the superset .yml file:

    '''
---------------------------KEYCLOACK ----------------------------
'''
curr = os.path.abspath(os.getcwd())
AUTH_TYPE = AUTH_OID
OIDC_CLIENT_SECRETS = curr + '/pythonpath/client_secret.json'
OIDC_ID_TOKEN_COOKIE_SECURE = True
OIDC_REQUIRE_VERIFIED_EMAIL = True
OIDC_OPENID_REALM: 'flowx'
OIDC_INTROSPECTION_AUTH_METHOD: 'client_secret_post'
CUSTOM_SECURITY_MANAGER = OIDCSecurityManager
AUTH_USER_REGISTRATION = False
AUTH_USER_REGISTRATION_ROLE = 'Admin'
OVERWRITE_REDIRECT_URI = 'https://{{ .Values.flowx.ingress.reporting }}/oidc_callback'
'''
--------------------------------------------------------------
'''

Was this page helpful?