Integration Options
Use Case 1: Monitor an Onboarded Service (central one)
Introduction
This use case covers the scenario to monitor a service Onboarded to EOSC via the Providers Portal. The results of this process will become available via the EOSC Exchange Monitoring WebUI (https://argo.eosc-portal.eu). In order to start monitoring an onboarded service, several requirements should be met. In addition to the basic information provided during the onboarding process, the service provider needs to provide some extra information needed by the ARGO monitoring service, described in the section below.
Solution
In order to start monitoring a service, a customer should follow the steps described below.
Step 1 Onboard the service
Before the service can be monitored, it should be onboarded into the EOSC Portal. The procedure for service on-boarding is described in detail in the EOSC portal onboarding process wiki page.
Step 2 Provide additional info for monitoring
When the service has been successfully onboarded, the ARGO monitoring service requires some additional information. First and foremost, the monitoring service requires the probes and metrics to be associated with the service.
Once the service provider decides on the probes/metrics they wish to use, the metrics should be mapped to the service they wish to monitor in EOSC-Exchange Metric Profile. After the metric profile, aggregation and thresholds profiles should also be updated.
Step 3 Start monitoring
Once all the information has been provided, the monitoring of the service starts and the ARGO monitoring Computation and Analytics component calculates availability and reliability of the service, and creates a report.
The Service Provider can have a look at the A/R and status results from the EOSC-Exchange Monitoring UI.
Use Case 2: Monitor an Infrastructure (community).
Introduction
Use case 2 covers the scenario when infrastructure monitoring requirements cannot be met by EOSC-Exchange Monitoring. For example, one the following are required:
- defining custom topology and aggregation of monitored endpoints
- selecting from existing range of probes and adding custom ones
- managing profiles and metrics for different services
Solution
In order to start monitoring an infrastructure, an Infrastructure Manager should follow the steps described below.
Step 1 EOSC helpdesk request
The Infrastructure Manager opens a ticket on EOSC Helpdesk requesting creation of an ARGO Monitoring instance for monitoring new infrastructure. Minimum information that should be provided in ticket:
- Infrastructure topology
- Personnel responsible for managing profiles
- URLs for POEM and UI components
Step 2 ARGO team initial actions
ARGO team will create a new tenant based on provided information and reply to the initial request that all instances are ready for use.
Step 3 Define initial monitoring profiles
Minimum set of profiles that must be defined before monitoring can start:
- List of metrics must be selected from the metric repository
- Metric Profile
- Aggregation Profile
Step 4 Start monitoring
Once all the information has been provided, the monitoring of the service starts and the ARGO monitoring Computation and Analytics component calculates availability and reliability of the service, and creates a report. The Infrastructure Manager can have a look at the A/R and status results from the dedicated UI. Monitoring new services is described in Use Case 1.
Use Case 3: Integrate External Monitoring service.
Introduction
In order to be able to scale-out and take advantage of existing Monitoring systems, the EOSC Monitoring service is capable of accepting data from external sources. When referring to external sources we mean other monitoring engines that want to connect with the EOSC Monitoring Service. This use case is split in two different sections as follows:
- Case 3.1: Supported Monitoring Engine and Operating System (Nagios on Centos 7 or Debian 8).
- Case 3.2 Other Monitoring Engine and Operating System
Solution
The connection of a monitoring system with EOSC is based mainly on the data that have the necessary information to create the final report. In this use case an external monitoring system replaces the internal monitoring engine and is thus reliable for the validity of the monitoring data that is published.
Step 1: EOSC helpdesk
The interested party opens a ticket on EOSC Helpdesk requesting to start the process to connect to the EOSC Monitoring Service. During the preparation of its request they need to prepare their systems to be able to provide the following information:
- The type of system used
- Infrastructure topology
- Personnel responsible for managing the necessary profiles
- URLs for POEM and UI components.
Step 2: The Monitoring team creates a new Tenant.
The monitoring team creates a new tenant on the monitoring service and at the same time requests from the messaging team to create the necessary configuration on the EOSC Messaging service. As a result the team will then send to the customer the necessary instructions and access tokens to connect to the Monitoring Service.
Step 3: The monitoring team assists the interested party to create the necessary profiles.
The profiles that need to be defined
- Metric Profile
- Aggregation Profile.
Step 4: Publish Metric Data
The customer will need to make the necessary configuration on their monitoring engine in order to start publishing metric data via the EOSC messaging service. The EOSC Monitoring Service supports two options
Case 3.1: Supported monitoring Engine and Operating System (Nagios on Centos 7 or Debian 8).
If the customers uses Nagios as its monitoring tool, EOSC Monitoring offers the argo-nagios-ams-publisher tool that is currently supported on Centos-7 and Debian-8. argo-nagios-ams-publisher is a component acting as a bridge from Nagios to ARGO Messaging system and finally to the ARGO Monitoring Engine. It is responsible for forming and dispatching messages that wrap up results from the monitoring engine. In order to use the this solution the customer will need to :
- Install argo-nagios-ams-publisher and ams-library
- Configure argo-nagios-ams-publisher
- Enable OCSP in Nagios:
In /etc/nagios/nagios.cfg add this configuration
obsess_over_services=1 ocsp_command=argo_service_check ocsp_timeout=15 |
- Add OCSP command:
should add an OCSP command in /etc/nagios/objects/commands.cfg
define command { command_name argo_service_check command_line /usr/bin/ams-metric-to-queue --queue /var/spool/argo-nagios-ams-publisher/metrics/ --hostname "$HOSTNAME$" --status "$SERVICESTATE$" --summary "$SERVICEOUTPUT$" --message "$LONGSERVICEOUTPUT$" --servicestatetype "$SERVICESTATETYPE$" --actual_data "$SERVICEPERFDATA$" --service "$_SERVICESERVICE_FLAVOUR$" --metric "$_SERVICEMETRIC_NAME$" } |
- All the Services to be published must have following attributes set:
define service { use generic-service; Name of service template to use host_name grnet.gr service_description HTTP check_command check_http check_interval 5 _service_flavour WebPortal //the service _metric_name org.nagios.WebCheck } |
- Start argo-nagios-ams-publisher by executing
service ams-publisherd start |
Case 3.2 Other monitoring systems
In this solution - use case the client cannot or doesn't want to use the solution described in the case 3.1 . Then the external monitoring system should find a way to send the monitoring data (metric data) to the EOSC Monitoring . These data should follow a predefined format.
The data should be stamped with their source and timestamp. Every metric should be prefixed with [source_type], following the metric naming best practises. Every metric is also labelled with the hostname and service description. These predefined messages should be sent to the EOSC Messaging service which is the service responsible to pass them to the computations engine which performs the necessary calculations to produce the reports.
{ "hostname": "host101.example.com", "service": "eu.eosc.portal.services.url", "metric": "org.nagios.WebCheck", "timestamp": "2022-01-02T00:24:38Z", "status": "OK", "tags": { "endpoint_group": "GroupA" }, "summary": "200 OK", "actual_data": "time=0.085796s;;;0.000000 size=1126B;;;0", "monitoring_host": "monbox.example.com",//name of the external monitoring box "message": "a more detailed message about the monitoring result" } |
Metric data comes in the form of avro files, (json files support currently in development ) and contains timestamped status information about the hostname, service and specific checks (metrics) that are being monitored. A typical item of information in the metric data contains the field listed in the table below.
{"namespace": "argo.avro", //currently this type is supported. "type": "record", "name": "metric_data", "fields": [ {"name": "timestamp", "type": "string"}, {"name": "service", "type": "string"}, {"name": "hostname", "type": "string"}, {"name": "metric", "type": "string"}, {"name": "status", "type": "string"}, {"name": "monitoring_host", "type": ["null", "string"]}, {"name": "summary", "type": ["null", "string"]}, {"name": "message", "type": ["null", "string"]}, {"name": "tags", "type" : ["null", {"name" : "Tags",
"type" : "map", "values" : ["null", "string"] }] }] } |
Table x: The accepted format of the schema.
The monitoring team will validate the published metric data against the supplied topology and perform a number of dry runs to ensure that there is no issue with the supplied data. As soon as the metric data is validated by the Monitoring Team these will be the main data to compute A/R and status results.
Step 5: Start Monitoring
Once information has been provided, the monitoring of the service starts and the ARGO monitoring Computation and Analytics component calculates availability and reliability of the service, and creates a report. The Infrastructure Manager can have a look at the A/R and status results from the dedicated UI. Monitoring new services is described in Use Case 1.
Use Case 4: Combine Results of existing ARGO Tenants.
Introduction
This use case covers the scenarios where the topology and the results of multiple tenants need to be combined in a number of reports.
Prerequisites
In order to combine results from tenants A, B (example names), those tenants should be already monitored by ARGO Monitoring service complete with the following definitions for each tenant:
- Latest Data available: Each tenant should be checked that has an active stream of incoming monitoring data.
- Topology: Each tenant should already have a well defined source of topology that includes lists of groups, endpoints and services.
- Metric Profile: In simple terms, a list of all services to be checked along with all relevant metrics per service
Solution
Step 1: Open a ticket to helpdesk
In order to have results, the customer should create a ticket on the helpdesk describing:
- Tenants to be used in the combined report
- Services and metrics
- Aggregation profile.
For each tenant that is going to take part in producing the combined results check that all of the prerequisites (mentioned in the previous section) do apply.
Step 2: Creation of the Combined Tenant.
Create a new tenant that will host the combined report. This tenant will act as a host tenant for the combined results and will rely on the data of the other tenants as input for the computations of the availability, reliability and status results.
Step 3 Start monitoring
Once all the information has been provided, the monitoring of the service starts and the ARGO monitoring Computation and Analytics component calculates availability and reliability of the services, and creates a report.
The User can have a look at the A/R and status results from the combined reports from the UI.
Use Case 5: Third-party services exploiting EOSC Monitoring data
Introduction
This use case covers the scenario according to which the customer needs to use the results of the EOSC Monitoring Service in an external service/dashboard.
The customer can access the following information via an API:
- A/R information about the service and its service components
- Status information about the service and its service components
- The topology and grouping of the service
Solution
Step 1: EOSC helpdesk
Τhe user that wants to gain access to this type of monitoring information will get a token with read-only access to the A/R and status results. The user via the EOSC helpdesk may send his request to the monitoring team by sending:
- The name of the service that wants the information
- An email to create the user
- The type of information (A/R results, status results, both)
Step 2: Start Ingesting the data.
The monitoring team will provide the required token and information, guidance on how to retrieve the information.
Example used
In this example we are going to present how the user can get the availability, the reliability values and the status of the AMS (Messaging Service) ( endpoint: https://msg.argo.grnet.gr ) of the Organisation GRNET.
The Monitoring Service Monitoring Service is checking the services at regular intervals. It actually runs explicit tests (checks) in order to assess the status of the service. The result of the checks decides on the status of the service. In order to display status information it uses reports where it keeps all the necessary information.
At the same time it produces useful conclusions about the monitoring item via the monitoring analytics engine. One very useful conclusion is to decide if the item is available for usage and if it is considered as reliable. To succeed this, availability/reliability values (hourly, daily, monthly) are calculated. These different types of information are also encapsulated in a report.
The EOSC monitoring service monitors the Messaging Service and it performs the following checks
- cert_validity_check : a metric that checks the validity of the certificate used by the service
- ams_check: a metric that checks a list of functionalities provided by the messaging service.
Based on the explanation provided above, the information about the service follows:
Definition | Value | Description |
GROUP | GRNET | A collection of services |
SERVICE | AMS | The type of one of the services of the collection |
SERVICE endpoint | msg.argo.grnet.gr(AMS) | is defined as the combination of a hostname and Service Type. (a Service Type of AMS listening on port/s <ams-port/s> on the host msg.argo.grnet.gr is a service endpoint) |
Grouping used in the report | SERVICEGROUPS | the way the services are organized (e.g. in groups of sites, in groups of services) in the monitoring engine |
A/R report | Default | The place where the A/R results are provided. |
Status report | Default | The place where status results are provided. |
This is the configuration that the user will have to use to use the api calls.
API call examples for A/R reports
The api authenticates the user using the api-key within the x-api-key header. Users can specify time granularity (monthly or daily) for retrieved results and also format using the Accept header. Depending on the form of the request the user can request a group, service or service endpoint.
Detailed documentation: https://argoeu.github.io/api/v3/results/
Example
For the AMS the corresponding api call to get the A/R of the service group GRNET is:
Request for A/R results for service group GRNET
$ curl -X GET -H "Accept: application/json" -H "Content-Type: application/json" -H "x-api-key: secret-token" https://api.argo.grnet.gr/api/v3/results/Default/SERVICEGROUPS/GRNET?start_time=2021-08-05T00:00:00Z&end_time=2021-08-05T23:59:59Z |
API call examples for status reports
The api authenticates the user using the api-key within the x-api-key header. Users can specify time granularity (monthly or daily) for retrieved results and also format using the Accept header. Depending on the form of the request the user can request a group, service or service endpoint.
Detailed documentation: https://argoeu.github.io/api/v3/status/
Example
For the AMS the corresponding api call to get the status of the service group GRNET is:
Request for status results for service group GRNET
$ curl -X GET -H "Accept: application/json" -H "Content-Type: application/json" -H "x-api-key: secret-token" https://api.argo.grnet.gr/api/v3/status/Default/SERVICEGROUPS/GRNET?start_time=2021-08-05T00:00:00Z&end_time=2021-08-05T23:59:59Z |