Dependencies
The reporting plugin, available as a Docker image, requires the following dependencies:
- PostgreSQL: Dedicated instance for reporting data storage.
- Reporting-plugin Helm Chart:
- Utilizes a Spark Application to extract data from the FLOWX.AI Engine database and populate the Reporting plugin database.
- Utilizes Spark Operator (more info here).
- Superset:
- Requires a dedicated PostgreSQL database for its operation.
- Utilizes Redis for efficient caching.
- Exposes its user interface via an ingress.
Reporting plugin helm chart configuration
Configuring the reporting plugin involves several steps:
Installation of Spark Operator
- Install the Spark Operator using Helm:
helm install local-spark-release spark-operator/spark-operator \
--namespace spark-operator --create-namespace \
--set webhook.enable=true \
--set logLevel=6
- Apply RBAC configurations:
kubectl apply -f spark-rbac.yaml
- Build the reporting image:
-
Update the reporting-image
URL in the spark-app.yml
file.
-
Configure the correct database ENV variables in the spark-app.yml
file (check them in the above examples with/without webhook).
-
Deploy the application:
kubectl apply -f operator/spark-app.yaml
Spark Operator deployment options
Without webhook
For deployments without a webhook, manage secrets and environmental variables for security:
sparkApplication:
enabled: "true"
schedule: "@every 5m"
driver:
envVars:
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
ENGINE_DATABASE_PASSWORD: "password"
REPORTING_DATABASE_PASSWORD: "password"
executor:
envVars:
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
ENGINE_DATABASE_PASSWORD: "password"
REPORTING_DATABASE_PASSWORD: "password"
NOTE: Passwords are currently set as plain strings, which is not secure practice in a production environment.
With webhook
When using the webhook, employ environmental variables with secrets for a balanced security approach:
sparkApplication:
enabled: "true"
schedule: "@every 5m"
driver:
env:
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets:
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password
executor:
env:
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets:
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password
In Kubernetes-based Spark deployments managed by the Spark Operator, you can define the sparkApplication configuration to customize the behavior, resources, and environment for both the driver and executor components of Spark jobs. The driver section allows fine-tuning of parameters specifically pertinent to the driver part of the Spark application.
Below are the configurable values within the chart values.yml file (with webhook):
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: ScheduledSparkApplication
metadata:
name: reporting-plugin-spark-app
namespace: dev
labels:
app.kubernetes.io/component: reporting
app.kubernetes.io/instance: reporting-plugin
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: reporting-plugin
app.kubernetes.io/release: 0.0.1-FLOWXRELEASE
app.kubernetes.io/version: 0.0.1-FLOWXVERSION
helm.sh/chart: reporting-plugin-0.1.1-PR-9-4-20231122153650-e
spec:
schedule: '@every 5m'
concurrencyPolicy: Forbid
template:
type: Python
pythonVersion: "3"
mode: cluster
image: eu.gcr.io/prj-cicd-d-flowxai-jx-6401/reporting-plugin:0.1.1-PR-9-4-20231122153650-eb6c
imagePullPolicy: IfNotPresent
mainApplicationFile: local:///opt/spark/work-dir/main.py
sparkVersion: "3.1.1"
restartPolicy:
type: Never
onFailureRetries: 0
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
driver:
cores: 1
coreLimit: 1200m
memory: 512m
labels:
version: 3.1.1
serviceAccount: spark
env:
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
ENGINE_DATABASE_PASSWORD: "password"
REPORTING_DATABASE_PASSWORD: "password"
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets:
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password
executor:
cores: 1
instances: 3
memory: 512m
labels:
version: 3.1.1
env:
ENGINE_DATABASE_USER: flowx
ENGINE_DATABASE_URL: postgresql:5432
ENGINE_DATABASE_NAME: process_engine
ENGINE_DATABASE_TYPE: postgres
REPORTING_DATABASE_USER: flowx
REPORTING_DATABASE_URL: postgresql:5432
REPORTING_DATABASE_NAME: reporting
extraEnvVarsMultipleSecretsCustomKeys:
- name: postgresql-generic
secrets:
ENGINE_DATABASE_PASSWORD: postgresql-password
REPORTING_DATABASE_PASSWORD: postgresql-password
Superset configuration
Detailed Superset Configuration Guide:
Refer to Superset Documentation for in-depth information:
Post-installation steps
After installation, perform the following essential configurations:
Datasource configuration
For document-related data storage, configure these environment variables:
SPRING_DATASOURCE_URL
SPRING_DATASOURCE_USERNAME
SPRING_DATASOURCE_PASSWORD
Ensure accurate details to prevent startup errors. The Liquibase script manages schema and migrations.
Redis configuration
The following values should be set with the corresponding Redis-related values:
SPRING_REDIS_HOST
SPRING_REDIS_PORT
Keycloak configuration
To implement alternative user authentication:
- Override
AUTH_TYPE
in your superset.yml
configuration file:
- Provide the reference to your
openid-connect
realm:
OIDC_OPENID_REALM: 'flowx'
With this configuration, the login page changes to a prompt where the user can select the desired OpenID provider.
Extend the security manager
Firstly, you will want to make sure that flask stops using flask-openid
and starts using flask-oidc
instead.
To do so, you will need to create your own security manager that configures flask-oidc
as its authentication provider.
extraSecrets:
keycloak_security_manager.py: |
from flask_appbuilder.security.manager import AUTH_OID
from superset.security import SupersetSecurityManager
from flask_oidc import OpenIDConnect
To enable OpenID in Superset, you would previously have had to set the authentication type to AUTH_OID
.
The security manager still executes all the behavior of the super class, but overrides the OID attribute with the OpenIDConnect
object.
Further, it replaces the default OpenID authentication view with a custom one:
from flask_appbuilder.security.views import AuthOIDView
from flask_login import login_user
from urllib.parse import quote
from flask_appbuilder.views import expose
from flask import request, redirect
class AuthOIDCView(AuthOIDView):
@expose('/login/', methods=['GET', 'POST'])
def login(self, flag=True):
sm = self.appbuilder.sm
oidc = sm.oid
superset_roles = ["Admin", "Alpha", "Gamma", "Public", "granter", "sql_lab"]
default_role = "Admin"
@self.appbuilder.sm.oid.require_login
def handle_login():
user = sm.auth_user_oid(oidc.user_getfield('email'))
if user is None:
info = oidc.user_getinfo(['preferred_username', 'given_name', 'family_name', 'email', 'roles'])
roles = [role for role in superset_roles if role in info.get('roles', [])]
roles += [default_role, ] if not roles else []
user = sm.add_user(info.get('preferred_username'), info.get('given_name', ''), info.get('family_name', ''),
info.get('email'), [sm.find_role(role) for role in roles])
login_user(user, remember=False)
return redirect(self.appbuilder.get_url_for_index)
return handle_login()
@expose('/logout/', methods=['GET', 'POST'])
def logout(self):
oidc = self.appbuilder.sm.oid
oidc.logout()
super(AuthOIDCView, self).logout()
redirect_url = request.url_root.strip('/')
return redirect(
oidc.client_secrets.get('issuer') + '/protocol/openid-connect/logout?redirect_uri=' + quote(redirect_url))
On authentication, the user is redirected back to Superset.
Finally, we need to add some parameters to the superset .yml file:
'''
---------------------------KEYCLOACK ----------------------------
'''
curr = os.path.abspath(os.getcwd())
AUTH_TYPE = AUTH_OID
OIDC_CLIENT_SECRETS = curr + '/pythonpath/client_secret.json'
OIDC_ID_TOKEN_COOKIE_SECURE = True
OIDC_REQUIRE_VERIFIED_EMAIL = True
OIDC_OPENID_REALM: 'flowx'
OIDC_INTROSPECTION_AUTH_METHOD: 'client_secret_post'
CUSTOM_SECURITY_MANAGER = OIDCSecurityManager
AUTH_USER_REGISTRATION = False
AUTH_USER_REGISTRATION_ROLE = 'Admin'
OVERWRITE_REDIRECT_URI = 'https://{{ .Values.flowx.ingress.reporting }}/oidc_callback'
'''
--------------------------------------------------------------
'''