Indexing strategy
- Advantages of Multiple Small Indices:
- Fast indexing process.
- Flexibility in cleaning up old data.
- Potential Drawbacks:
- Hitting the maximum number of shards per node, resulting in exceptions when creating new indices.
- Increased search response time and memory footprint.
- Deletion
- When deleting data in Elasticsearch, itβs recommended to delete entire indices instead of individual documents. Creating multiple smaller indices provides the flexibility to delete entire indices of old data that are no longer needed.
What is indexing?
Shard and replica configuration
The solution includes an index template that gets created with the settings from the process-engine app (name, shards, replicas) when running the app for the first time. This template controls the settings and mapping of all newly created indices.What is sharding?
Index template
Recommendations for resource management
To manage functional indexing operations and resources efficiently, consider the following recommendations:- Sizing indexes upon creation
- Balancing
- Delete unneeded indices
- Reindex large indices
- Force merge indices
- Shrink indices
- Combine indices
Sizing indexes upon creation
Recommendations:- Start with monthly indexes that have 2 shards and 1 replica. This setup is typically sufficient for handling up to 200k process instances per day; ensures a parallel indexing in two main shards and has also 1 replica per each main shard (4 shards in total). This would create 48 shards per year in the Elasticsearch nodes; A lot less than the default 1000 shards, so you will have enough space for other indexes as well.
- If you observe that the indexing gets really, really slow, then you should look at the physical resources / shard size and start adapting the config.
- If you observe that indexing one monthly index gets massive and affects the performance, then think about switching to weekly indices.
- If you have huge spikes of parallel indexing load (even though that depends on the Kafka connect cluster configuration), then think about adding more main shards.
- Consider having at least one replica for high availability. However, keep in mind that the number of replicas is applied to each shard, so creating many replicas may lead to increased resource usage.
- Monitor the number of shards created and estimate when you might reach the maximum shards per node, taking into account the number of nodes in your cluster.