Monitoring v1
Important
Installing Prometheus and Grafana is beyond the scope of this project. We assume they are correctly installed in your system. However, for experimentation we provide instructions in Part 4 of the Quickstart.
Monitoring Instances
For each PostgreSQL instance, the operator provides an exporter of metrics for
Prometheus via HTTP or HTTPS, on port 9187, named metrics
.
The operator comes with a predefined set of metrics, as well as a highly
configurable and customizable system to define additional queries via one or
more ConfigMap
or Secret
resources (see the
"User defined metrics" section below for details).
Important
EDB Postgres for Kubernetes, by default, installs a set of predefined metrics
in a ConfigMap
named default-monitoring
.
Info
You can inspect the exported metrics by following the instructions in the "How to inspect the exported metrics" section below.
All monitoring queries that are performed on PostgreSQL are:
- atomic (one transaction per query)
- executed with the
pg_monitor
role - executed with
application_name
set tocnp_metrics_exporter
- executed as user
postgres
Please refer to the "Predefined Roles" section in PostgreSQL
documentation
for details on the pg_monitor
role.
Queries, by default, are run against the main database, as defined by
the specified bootstrap
method of the Cluster
resource, according
to the following logic:
- using
initdb
: queries will be run by default against the specified database ininitdb.database
, orapp
if not specified - using
recovery
: queries will be run by default against the specified database inrecovery.database
, orpostgres
if not specified - using
pg_basebackup
: queries will be run by default against the specified database inpg_basebackup.database
, orpostgres
if not specified
The default database can always be overridden for a given user-defined metric,
by specifying a list of one or more databases in the target_databases
option.
Prometheus/Grafana
If you are interested in evaluating the integration of EDB Postgres for Kubernetes with Prometheus and Grafana, you can find a quick setup guide in Part 4 of the quickstart
Prometheus Operator example
A specific PostgreSQL cluster can be monitored using the Prometheus Operator's resource PodMonitor.
A PodMonitor
that correctly points to the Cluster can be automatically created by the operator by setting
.spec.monitoring.enablePodMonitor
to true
in the Cluster resource itself (default: false
).
Important
Any change to the PodMonitor
created automatically will be overridden by the Operator at the next reconciliation
cycle, in case you need to customize it, you can do so as described below.
To deploy a PodMonitor
for a specific Cluster manually, define it as follows and adjust as needed:
Important
Ensure you modify the example above with a unique name, as well as the
correct cluster's namespace and labels (e.g., cluster-example
).
Important
The postgresql
label, used in previous versions of this document, is deprecated
and will be removed in the future. Please use the k8s.enterprisedb.io/cluster
label
instead to select the instances.
Enabling TLS on the Metrics Port
To enable TLS communication on the metrics port, configure the .spec.monitoring.tls.enabled
setting to true
. This setup ensures that the metrics exporter uses the same
server certificate used by PostgreSQL to secure communication on port 5432.
Important
Changing the .spec.monitoring.tls.enabled
setting will trigger a rolling restart of the Cluster.
If the PodMonitor
is managed by the operator (.spec.monitoring.enablePodMonitor
set to true
),
it will automatically contain the necessary configurations to access the metrics via TLS.
To manually deploy a PodMonitor
suitable for reading metrics via TLS, define it as follows and
adjust as needed:
Important
Ensure you modify the example above with a unique name, as well as the
correct Cluster's namespace and labels (e.g., cluster-example
).
Important
The serverName
field in the metrics endpoint must match one of the names
defined in the server certificate. If the default certificate is in use,
the serverName
value should be in the format <cluster-name>-rw
.
Predefined set of metrics
Every PostgreSQL instance exporter automatically exposes a set of predefined metrics, which can be classified in two major categories:
PostgreSQL related metrics, starting with
cnp_collector_*
, including:- number of WAL files and total size on disk
- number of
.ready
and.done
files in the archive status folder - requested minimum and maximum number of synchronous replicas, as well as the expected and actually observed values
- number of distinct nodes accommodating the instances
- timestamps indicating last failed and last available backup, as well as the first point of recoverability for the cluster
- flag indicating if replica cluster mode is enabled or disabled
- flag indicating if a manual switchover is required
- flag indicating if fencing is enabled or disabled
Go runtime related metrics, starting with
go_*
Below is a sample of the metrics returned by the localhost:9187/metrics
endpoint of an instance. As you can see, the Prometheus format is
self-documenting:
Note
cnp_collector_postgres_version
is a GaugeVec metric containing the
Major.Minor
version of Postgres (either PostgreSQL or EPAS). The full
semantic version Major.Minor.Patch
can be found inside one of its label
field named full
.
Note
cnp_collector_first_recoverability_point
and cnp_collector_last_available_backup_timestamp
will be zero until your first backup to the object store. This is separate from the WAL archival.
User defined metrics
This feature is currently in beta state and the format is inspired by the queries.yaml file (release 0.12) of the PostgreSQL Prometheus Exporter.
Custom metrics can be defined by users by referring to the created Configmap
/Secret
in a Cluster
definition
under the .spec.monitoring.customQueriesConfigMap
or customQueriesSecret
section as in the following example:
The customQueriesConfigMap
/customQueriesSecret
sections contain a list of
ConfigMap
/Secret
references specifying the key in which the custom queries are defined.
Take care that the referred resources have to be created in the same namespace as the Cluster resource.
Note
If you want ConfigMaps and Secrets to be automatically reloaded by instances, you can
add a label with key k8s.enterprisedb.io/reload
to it, otherwise you will have to reload
the instances using the kubectl cnp reload
subcommand.
Important
When a user defined metric overwrites an already existing metric the instance manager prints a json warning log,
containing the message:Query with the same name already found. Overwriting the existing one.
and a key queryName
containing the overwritten query name.
Example of a user defined metric
Here you can see an example of a ConfigMap
containing a single custom query,
referenced by the Cluster
example above:
A list of basic monitoring queries can be found in the
default-monitoring.yaml
file
that is already installed in your EDB Postgres for Kubernetes deployment (see "Default set of metrics").
Example of a user defined metric with predicate query
The predicate_query
option allows the user to execute the query
to collect the metrics only under the specified conditions.
To do so the user needs to provide a predicate query that returns at most one row with a single boolean
column.
The predicate query is executed in the same transaction as the main query and against the same databases.
Example of a user defined metric running on multiple databases
If the target_databases
option lists more than one database
the metric is collected from each of them.
Database auto-discovery can be enabled for a specific query by specifying a
shell-like pattern (i.e., containing *
, ?
or []
) in the list of
target_databases
. If provided, the operator will expand the list of target
databases by adding all the databases returned by the execution of SELECT
datname FROM pg_database WHERE datallowconn AND NOT datistemplate
and matching
the pattern according to path.Match() rules.
Note
The *
character has a special meaning in yaml,
so you need to quote ("*"
) the target_databases
value when it includes such a pattern.
It is recommended that you always include the name of the database
in the returned labels, for example using the current_database()
function
as in the following example:
This will produce in the following metric being exposed:
Here is an example of a query with auto-discovery enabled which also
runs on the template1
database (otherwise not returned by the
aforementioned query):
The above example will produce the following metrics (provided the databases exist):
Structure of a user defined metric
Every custom query has the following basic structure:
Here is a short description of all the available fields:
<MetricName>
: the name of the Prometheus metricname
: override<MetricName>
, if definedquery
: the SQL query to run on the target database to generate the metricsprimary
: whether to run the query only on the primary instancemaster
: same asprimary
(for compatibility with the Prometheus PostgreSQL exporter's syntax - deprecated)runonserver
: a semantic version range to limit the versions of PostgreSQL the query should run on (e.g.">=11.0.0"
or">=12.0.0 <=15.0.0"
)target_databases
: a list of databases to run thequery
against, or a shell-like pattern to enable auto discovery. Overwrites the default database if provided.predicate_query
: a SQL query that returns at most one row and oneboolean
column to run on the target database. The system evaluates the predicate and iftrue
executes thequery
.metrics
: section containing a list of all exported columns, defined as follows:<ColumnName>
: the name of the column returned by the queryname
: override theColumnName
of the column in the metric, if definedusage
: one of the values described belowdescription
: the metric's descriptionmetrics_mapping
: the optional column mapping whenusage
is set toMAPPEDMETRIC
The possible values for usage
are:
Column Usage Label | Description |
---|---|
DISCARD | this column should be ignored |
LABEL | use this column as a label |
COUNTER | use this column as a counter |
GAUGE | use this column as a gauge |
MAPPEDMETRIC | use this column with the supplied mapping of text values |
DURATION | use this column as a text duration (in milliseconds) |
HISTOGRAM | use this column as a histogram |
Please visit the "Metric Types" page from the Prometheus documentation for more information.
Output of a user defined metric
Custom defined metrics are returned by the Prometheus exporter endpoint (:9187/metrics
)
with the following format:
Note
LabelColumnName
are metrics with usage
set to LABEL
and their Value
Considering the pg_replication
example above, the exporter's endpoint would
return the following output when invoked:
Default set of metrics
The operator can be configured to automatically inject in a Cluster a set of
monitoring queries defined in a ConfigMap or a Secret, inside the operator's namespace.
You have to set the MONITORING_QUERIES_CONFIGMAP
or
MONITORING_QUERIES_SECRET
key in the "operator configuration",
respectively to the name of the ConfigMap or the Secret;
the operator will then use the content of the queries
key.
Any change to the queries
content will be immediately reflected on all the
deployed Clusters using it.
The operator installation manifests come with a predefined ConfigMap,
called postgresql-operator-default-monitoring
, to be used by all Clusters.
MONITORING_QUERIES_CONFIGMAP
is by default set to postgresql-operator-default-monitoring
in the operator configuration.
If you want to disable the default set of metrics, you can:
- disable it at operator level: set the
MONITORING_QUERIES_CONFIGMAP
/MONITORING_QUERIES_SECRET
key to""
(empty string), in the operator ConfigMap. Changes to operator ConfigMap require an operator restart. - disable it for a specific Cluster: set
.spec.monitoring.disableDefaultQueries
totrue
in the Cluster.
Important
The ConfigMap or Secret specified via MONITORING_QUERIES_CONFIGMAP
/MONITORING_QUERIES_SECRET
will always be copied to the Cluster's namespace with a fixed name: postgresql-operator-default-monitoring
.
So that, if you intend to have default metrics, you should not create a ConfigMap with this name in the cluster's namespace.
Differences with the Prometheus Postgres exporter
EDB Postgres for Kubernetes is inspired by the PostgreSQL Prometheus Exporter, but
presents some differences. In particular, the cache_seconds
field is not implemented
in EDB Postgres for Kubernetes' exporter.
Monitoring the operator
The operator internally exposes Prometheus metrics
via HTTP on port 8080, named metrics
.
Info
You can inspect the exported metrics by following the instructions in the "How to inspect the exported metrics" section below.
Currently, the operator exposes default kubebuilder
metrics, see
kubebuilder documentation for more details.
Prometheus Operator example
The operator deployment can be monitored using the Prometheus Operator by defining the following PodMonitor resource:
How to inspect the exported metrics
In this section we provide some basic instructions on how to inspect
the metrics exported by a specific PostgreSQL instance manager (primary
or replica) or the operator, using a temporary pod running curl
in
the same namespace.
Note
In the example below we assume we are working in the default namespace, alongside with the PostgreSQL cluster. Please feel free to adapt this example to your use case, by applying basic Kubernetes knowledge.
Create the curl.yaml
file with this content:
Then create the pod:
In case you want to inspect the metrics exported by an instance, you need to connect to port 9187 of the target pod. This is the generic command to be run (make sure you use the correct IP for the pod):
For example, if your PostgreSQL cluster is called cluster-example
and
you want to retrieve the exported metrics of the first pod in the cluster,
you can run the following command to programmatically get the IP of
that pod:
And then run:
If you enabled TLS metrics, run instead:
In case you want to access the metrics of the operator, you need to point to the pod where the operator is running, and use TCP port 8080 as target.
At the end of the inspection, please make sure you delete the curl
pod:
Auxiliary resources
Important
These resources are provided for illustration and experimentation, and do not represent any kind of recommendation for your production system
In the doc/src/samples/monitoring/
directory you will find a series of sample files for observability.
Please refer to Part 4 of the quickstart
section for context:
kube-stack-config.yaml
: a configuration file for the kube-stack helm chart installation. It ensures that Prometheus listens for all PodMonitor resources.prometheusrule.yaml
: aPrometheusRule
with alerts for EDB Postgres for Kubernetes. NOTE: this does not include inter-operation with notification services. Please refer to the Prometheus documentation.podmonitor.yaml
: aPodMonitor
for the EDB Postgres for Kubernetes Operator deployment.
In addition, we provide the "raw" sources for the Prometheus alert rules in the
alerts.yaml
file.
The Grafana dashboard has a dedicated repository now.
Note that, for the configuration of kube-prometheus-stack
, other fields and
settings are available over what we provide in kube-stack-config.yaml
.
You can execute helm show values prometheus-community/kube-prometheus-stack
to view them. For further information, please refer to the
kube-prometheus-stack
page.
Monitoring on OpenShift
Starting on Openshift 4.6 there is a complete monitoring stack called
"Monitoring for user-defined projects"
which can be enabled by cluster administrators. Cloud Native PostgreSQL will
automatically create a PodMonitor
object if the option
spec.monitoring.enablePodMonitor
of the Cluster
definition is set to
true
.
To enable cluster wide user-defined
monitoring you must first create a
ConfigMap
with the name cluster-monitoring-config
in the
openshift-monitoring
namespace/project with the following content:
If the ConfigMap
already exists, just add the variable enableUserWorkload: true
.
Important
This will enable the monitoring for the whole cluster, if it is needed only for one namespace/project please refer to the official Red Hat documentation or talk with your cluster administrator.
After that, just create the proper PodMonitor in the namespace/project with something similar to this:
Note
We currently don’t use ServiceMonitor
because our service doesn’t define
a port pointing to the metrics. If we added a metric port this could expose
sensitive data.