System Architecture and Requirements / Scaling the ELMA365 Enterprise cluster

Scaling the ELMA365 Enterprise cluster

The diagram in this article displays the protocols and ports used for communication between the servers in ELMA365. The information provided on the diagram can also be used for deploying and scaling ELMA365 fault-tolerant clusters.

 

Data storage services (PostgreSQL, MongoDB, RabbitMQ, Redis, Minio S3 Deployment) have specific rules of deployment and scaling. You can read more in the “Fault-tolerant server architecture” section of the System Requirements for ELMA365 Enterprise edition article.

 

ELMA365 consists of multiple microservices isolated in containers all managed by Kubernetes container orchestrator (the current ELMA 365 On-premises distribution configured to use MicroK8s package, by default).

clip0060

 

The list of the ELMA365 cluster internal services

 

auth

Authorization and groups, users, and org chart management

balancer

Multi-tenancy management

calculator

Calculating different values in-app items

chat

Private and group messages

collector

App items read and filter

convertik

Office formats to pdf converting

deploy

Migration management

diskjockey

Files and directories

docflow

Document workflow. Approval and review.

event-bus

System events monitoring and processing via events bus

feeder

Activity stream / channels

front

ELMA365 Front end

integrations

External integrations

mailer

Email management

main

API gateway

messengers*

Live chats and external messengers management

notifier*

Notifications and web-sockets

processor

Process management

picasso

EDS fingerprint management

scheduler*

Schedules, tasks delayed start, time reports

settings

User profile and system profile management

templater

Text and office documents templater

vahter

Users management in a multi-tenancy system

web-forms

Web forms for external systems management

widget

Widget management (interface customization)

worker

Validating, transpiling, and user scripting

 

* - the service cannot be executed in multiple instances

 

Fault Tolerance and Scaling of individual services

 

Distributed Services Architecture allows the solution to scale more flexibly depending on the load profile of the system. By default, all cluster services (except scheduler, notifier, messengers) in the ELMA365 Enterprise cluster are performed in two instances (for ELMA365 Standard edition - in a single instance).

 

If the cluster combines multiple servers, the microk8s orchestrator tries to evenly distribute the instances of the services among the servers. If either server in the cluster goes down, the orchestrator determines the services interrupted by the failure and replicates them on another server.

 

It is recommended to run the ELMA365 cluster on at least three nodes to ensure fault tolerance. 

Nodes work together, and the cluster permanently checks if all of them are up and running. The server is considered to be “down” if it is unreachable through a network for a specified time. The cluster is fully functional until at least two nodes are up and running. 

 

Adding servers without proper cluster reconfiguration will not automatically solve the problem of increased load. This approach only decreases the chance of service global failure. To leverage cluster throughput you can upgrade any distinct server hardware to a relatively great extent ( handling up to 10000 simultaneous transactions per server).

 

However, often we can identify a workload bottleneck and eliminate it by scaling the corresponding service. Let’s take a look at the worker script executing service. This service is resource-hungry but runs in parallel very well. If your server is heavily used for server-side script execution then it is reasonable to configure a high replication factor for this service (above 2). The orchestrator will then create additional instances of the service to perform the scripts faster in parallel.

 

In the same way, you can configure multi-instances parallel execution for any other service. However, to understand what services should be scaled, it is necessary to analyze the workload profile of a particular configuration at a specific time.