Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update wait_class query; add vault; cleanup logs; update some deps #34

Merged
merged 6 commits into from
Oct 27, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -3,7 +3,7 @@ OS_TYPE ?= $(shell uname -s | tr '[:upper:]' '[:lower:]')
ARCH_TYPE ?= $(subst x86_64,amd64,$(patsubst i%86,386,$(ARCH)))
GOOS ?= $(shell go env GOOS)
GOARCH ?= $(shell go env GOARCH)
VERSION ?= 1.0.0
VERSION ?= 1.1.0
LDFLAGS := -X main.Version=$(VERSION)
GOFLAGS := -ldflags "$(LDFLAGS) -s -w"
BUILD_ARGS = --build-arg VERSION=$(VERSION)
28 changes: 24 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -2,13 +2,14 @@

This project aims to provide observability for the Oracle Database so that users can understand performance and diagnose issues easily across applications and database. Over time, this project will provide not just metrics, but also logging and tracing support, and integration into popular frameworks like Spring Boot. The project aims to deliver functionality to support both cloud and on-premises databases, including those running in Kubernetes and containers.

In the first production release, v1.0, this project provides a [Prometheus](https://prometheus.io/) exporter for Oracle Database that is based in part on a Prometheus exporter created by [Seth Miller](https://github.com/iamseth/oracledb_exporter) with changes to comply with various Oracle standards and policies.
From the first production release, v1.0, onwards, this project provides a [Prometheus](https://prometheus.io/) exporter for Oracle Database that is based in part on a Prometheus exporter created by [Seth Miller](https://github.com/iamseth/oracledb_exporter) with changes to comply with various Oracle standards and policies.

Contributions are welcome - please see [contributing](CONTRIBUTING.md).


### Table of Contents

- [Release Notes](#release-notes)
- [Roadmap](#roadmap)
- [Standard metrics](#standard-metrics)
- [Database permissions required](#database-permissions-required)
@@ -17,15 +18,26 @@ Contributions are welcome - please see [contributing](CONTRIBUTING.md).
- [Test/demo environment using Docker Compose](#testdemo-environment-with-docker-compose)
- [Kubernetes](#kubernetes)
- [Standalone binary](#standalone-binary)
- [Using OCI Vault](#using-oci-vault)
- [Custom metrics](#custom-metrics)
- [Grafana dashboards](#grafana-dashboards)
- [Monitoring Transactional Event Queues](#monitoring-transactional-event-queues)
- [Developer notes](#developer-notes)

## Release Notes

## Roadmap
### Version 1.1, October 27, 2023

This release includes the following changes:

### Version 1.0
- The query for the standard metric `wait_class` has been updated so that it will work in both container databases
and pluggable databases, including in Oracle Autonomous Database instances. Note that this query will not return
any data unless the database instance is under load.
- Support for reading the database password from OCI Vault has been added (see [details](#using-oci-vault))
- Log messages have been improved
- Some dependencies have been updated

### Version 1.0, September 13, 2023

The first production release, v1.0, includes the following features:

@@ -41,7 +53,7 @@ Note that this exporter uses a different Oracle Database driver which in turn us

The interfaces for this version have been kept as close as possible to those of earlier alpha releases in this repository to assist with migration. However, it should be expected that there may be breaking changes in future releases.

### Plans
## Roadmap

We always welcome input on features you would like to see supported. Please open an issue in this repository with your suggestions.

@@ -355,6 +367,14 @@ Usage of oracledb_exporter:
Path to configuration file that can enable TLS or authentication.
```

### Using OCI Vault

The exporter will read the password from a secret stored in OCI Vault if you set these two environment
variables:

- `VAULT_ID` should be set to the OCID of the OCI vault that you wish to use
- `VAULT_SECRET_NAME` should be set to the name of the secret in the OCI vault which contains the database password

## Custom metrics

The exporter allows definition of arbitrary custom metrics in a TOML file. To specify this file to the
97 changes: 53 additions & 44 deletions collector/collector.go
Original file line number Diff line number Diff line change
@@ -261,21 +261,22 @@ func (e *Exporter) scrape(ch chan<- prometheus.Metric) {

if err = e.db.Ping(); err != nil {
if strings.Contains(err.Error(), "sql: database is closed") {
level.Info(e.logger).Log("Reconnecting to DB")
level.Info(e.logger).Log("msg", "Reconnecting to DB")
err = e.connect()
if err != nil {
level.Error(e.logger).Log("Error reconnecting to DB", err)
level.Error(e.logger).Log("msg", "Error reconnecting to DB", err)
}
}
}

if err = e.db.Ping(); err != nil {
level.Error(e.logger).Log("Error pinging oracle:", err)
level.Error(e.logger).Log("msg", "Error pinging oracle:",
"error", err)
e.up.Set(0)
return
}

level.Debug(e.logger).Log("Successfully pinged Oracle database: ", maskDsn(e.connectString))
level.Debug(e.logger).Log("msg", "Successfully pinged Oracle database: "+maskDsn(e.connectString))
e.up.Set(1)

if e.checkIfMetricsChanged() {
@@ -291,50 +292,57 @@ func (e *Exporter) scrape(ch chan<- prometheus.Metric) {
go func() {
defer wg.Done()

level.Debug(e.logger).Log("About to scrape metric: ")
level.Debug(e.logger).Log("- Metric MetricsDesc: ", metric.MetricsDesc)
level.Debug(e.logger).Log("- Metric Context: ", metric.Context)
level.Debug(e.logger).Log("- Metric MetricsType: ", metric.MetricsType)
level.Debug(e.logger).Log("- Metric MetricsBuckets: ", metric.MetricsBuckets, "(Ignored unless Histogram type)")
level.Debug(e.logger).Log("- Metric Labels: ", metric.Labels)
level.Debug(e.logger).Log("- Metric FieldToAppend: ", metric.FieldToAppend)
level.Debug(e.logger).Log("- Metric IgnoreZeroResult: ", metric.IgnoreZeroResult)
level.Debug(e.logger).Log("- Metric Request: ", metric.Request)
level.Debug(e.logger).Log("msg", "About to scrape metric",
"Context", metric.Context,
"MetricsDesc", fmt.Sprint(metric.MetricsDesc),
"MetricsType", fmt.Sprint(metric.MetricsType),
"MetricsBuckets", fmt.Sprint(metric.MetricsBuckets), // ignored unless histogram
"Labels", fmt.Sprint(metric.Labels),
"FieldToAppend", metric.FieldToAppend,
"IgnoreZeroResult", metric.IgnoreZeroResult,
"Request", metric.Request)

if len(metric.Request) == 0 {
level.Error(e.logger).Log("Error scraping for ", metric.MetricsDesc, ". Did you forget to define request in your toml file?")
level.Error(e.logger).Log("msg", "Error scraping for "+fmt.Sprint(metric.MetricsDesc)+". Did you forget to define request in your toml file?")
return
}

if len(metric.MetricsDesc) == 0 {
level.Error(e.logger).Log("Error scraping for query", metric.Request, ". Did you forget to define metricsdesc in your toml file?")
level.Error(e.logger).Log("msg", "Error scraping for query"+fmt.Sprint(metric.Request)+". Did you forget to define metricsdesc in your toml file?")
return
}

for column, metricType := range metric.MetricsType {
if metricType == "histogram" {
_, ok := metric.MetricsBuckets[column]
if !ok {
level.Error(e.logger).Log("Unable to find MetricsBuckets configuration key for metric. (metric=" + column + ")")
level.Error(e.logger).Log("msg", "Unable to find MetricsBuckets configuration key for metric. (metric="+column+")")
return
}
}
}

scrapeStart := time.Now()
if err = e.ScrapeMetric(e.db, ch, metric); err != nil {
level.Error(e.logger).Log("Error scraping for", metric.Context, "_", metric.MetricsDesc, time.Since(scrapeStart), ":", err)
level.Error(e.logger).Log("msg", "Error scraping metric",
"Context", metric.Context,
"MetricsDesc", fmt.Sprint(metric.MetricsDesc),
"time", time.Since(scrapeStart),
"error", err)
e.scrapeErrors.WithLabelValues(metric.Context).Inc()
} else {
level.Debug(e.logger).Log("Successfully scraped metric: ", metric.Context, metric.MetricsDesc, time.Since(scrapeStart))
level.Debug(e.logger).Log("msg", "Successfully scraped metric",
"Context", metric.Context,
"MetricDesc", fmt.Sprint(metric.MetricsDesc),
"time", time.Since(scrapeStart))
}
}()
}
wg.Wait()
}

func (e *Exporter) connect() error {
level.Debug(e.logger).Log("Launching connection: ", maskDsn(e.connectString))
level.Debug(e.logger).Log("msg", "Launching connection to "+maskDsn(e.connectString))

var P godror.ConnectionParams
P.Username, P.Password, P.ConnectString = e.user, godror.NewPassword(e.password), e.connectString
@@ -344,11 +352,11 @@ func (e *Exporter) connect() error {
// level.Error(e.logger).Log("Error while connecting to", e.dsn)
// return err
// }
level.Debug(e.logger).Log("set max idle connections to ", e.config.MaxIdleConns)
level.Debug(e.logger).Log("msg", "set max idle connections to "+strconv.Itoa(e.config.MaxIdleConns))
db.SetMaxIdleConns(e.config.MaxIdleConns)
level.Debug(e.logger).Log("set max open connections to ", e.config.MaxOpenConns)
level.Debug(e.logger).Log("msg", "set max open connections to "+strconv.Itoa(e.config.MaxOpenConns))
db.SetMaxOpenConns(e.config.MaxOpenConns)
level.Debug(e.logger).Log("Successfully connected to: ", maskDsn(e.connectString))
level.Debug(e.logger).Log("msg", "Successfully connected to "+maskDsn(e.connectString))
e.db = db
return nil
}
@@ -358,15 +366,15 @@ func (e *Exporter) checkIfMetricsChanged() bool {
if len(_customMetrics) == 0 {
continue
}
level.Debug(e.logger).Log("Checking modifications in following metrics definition file:", _customMetrics)
level.Debug(e.logger).Log("msg", "Checking modifications in following metrics definition file:"+_customMetrics)
h := sha256.New()
if err := hashFile(h, _customMetrics); err != nil {
level.Error(e.logger).Log("Unable to get file hash", err)
level.Error(e.logger).Log("msg", "Unable to get file hash", "error", err)
return false
}
// If any of files has been changed reload metrics
if !bytes.Equal(hashMap[i], h.Sum(nil)) {
level.Info(e.logger).Log(_customMetrics, "has been changed. Reloading metrics...")
level.Info(e.logger).Log("msg", _customMetrics+" has been changed. Reloading metrics...")
hashMap[i] = h.Sum(nil)
return true
}
@@ -401,18 +409,18 @@ func (e *Exporter) reloadMetrics() {
level.Error(e.logger).Log(err)
panic(errors.New("Error while loading " + _customMetrics))
} else {
level.Info(e.logger).Log("Successfully loaded custom metrics from: " + _customMetrics)
level.Info(e.logger).Log("msg", "Successfully loaded custom metrics from "+_customMetrics)
}
e.metricsToScrape.Metric = append(e.metricsToScrape.Metric, additionalMetrics.Metric...)
}
} else {
level.Debug(e.logger).Log("No custom metrics defined.")
level.Debug(e.logger).Log("msg", "No custom metrics defined.")
}
}

// ScrapeMetric is an interface method to call scrapeGenericValues using Metric struct values
func (e *Exporter) ScrapeMetric(db *sql.DB, ch chan<- prometheus.Metric, metricDefinition Metric) error {
level.Debug(e.logger).Log("Calling function ScrapeGenericValues()")
level.Debug(e.logger).Log("msg", "Calling function ScrapeGenericValues()")
return e.scrapeGenericValues(db, ch, metricDefinition.Context, metricDefinition.Labels,
metricDefinition.MetricsDesc, metricDefinition.MetricsType, metricDefinition.MetricsBuckets,
metricDefinition.FieldToAppend, metricDefinition.IgnoreZeroResult,
@@ -434,11 +442,12 @@ func (e *Exporter) scrapeGenericValues(db *sql.DB, ch chan<- prometheus.Metric,
value, err := strconv.ParseFloat(strings.TrimSpace(row[metric]), 64)
// If not a float, skip current metric
if err != nil {
level.Error(e.logger).Log("Unable to convert current value to float (metric=" + metric +
",metricHelp=" + metricHelp + ",value=<" + row[metric] + ">)")
level.Error(e.logger).Log("msg", "Unable to convert current value to float (metric="+metric+
",metricHelp="+metricHelp+",value=<"+row[metric]+">)")
continue
}
level.Debug(e.logger).Log("Query result looks like: ", value)
level.Debug(e.logger).Log("msg", "Query result",
"value", value)
// If metric do not use a field content in metric's name
if strings.Compare(fieldToAppend, "") == 0 {
desc := prometheus.NewDesc(
@@ -449,21 +458,21 @@ func (e *Exporter) scrapeGenericValues(db *sql.DB, ch chan<- prometheus.Metric,
if metricsType[strings.ToLower(metric)] == "histogram" {
count, err := strconv.ParseUint(strings.TrimSpace(row["count"]), 10, 64)
if err != nil {
level.Error(e.logger).Log("Unable to convert count value to int (metric=" + metric +
",metricHelp=" + metricHelp + ",value=<" + row["count"] + ">)")
level.Error(e.logger).Log("msg", "Unable to convert count value to int (metric="+metric+
",metricHelp="+metricHelp+",value=<"+row["count"]+">)")
continue
}
buckets := make(map[float64]uint64)
for field, le := range metricsBuckets[metric] {
lelimit, err := strconv.ParseFloat(strings.TrimSpace(le), 64)
if err != nil {
level.Error(e.logger).Log("Unable to convert bucket limit value to float (metric=" + metric +
",metricHelp=" + metricHelp + ",bucketlimit=<" + le + ">)")
level.Error(e.logger).Log("msg", "Unable to convert bucket limit value to float (metric="+metric+
",metricHelp="+metricHelp+",bucketlimit=<"+le+">)")
continue
}
counter, err := strconv.ParseUint(strings.TrimSpace(row[field]), 10, 64)
if err != nil {
level.Error(e.logger).Log("Unable to convert ", field, " value to int (metric="+metric+
level.Error(e.logger).Log("msg", "Unable to convert ", field, " value to int (metric="+metric+
",metricHelp="+metricHelp+",value=<"+row[field]+">)")
continue
}
@@ -483,21 +492,21 @@ func (e *Exporter) scrapeGenericValues(db *sql.DB, ch chan<- prometheus.Metric,
if metricsType[strings.ToLower(metric)] == "histogram" {
count, err := strconv.ParseUint(strings.TrimSpace(row["count"]), 10, 64)
if err != nil {
level.Error(e.logger).Log("Unable to convert count value to int (metric=" + metric +
",metricHelp=" + metricHelp + ",value=<" + row["count"] + ">)")
level.Error(e.logger).Log("msg", "Unable to convert count value to int (metric="+metric+
",metricHelp="+metricHelp+",value=<"+row["count"]+">)")
continue
}
buckets := make(map[float64]uint64)
for field, le := range metricsBuckets[metric] {
lelimit, err := strconv.ParseFloat(strings.TrimSpace(le), 64)
if err != nil {
level.Error(e.logger).Log("Unable to convert bucket limit value to float (metric=" + metric +
",metricHelp=" + metricHelp + ",bucketlimit=<" + le + ">)")
level.Error(e.logger).Log("msg", "Unable to convert bucket limit value to float (metric="+metric+
",metricHelp="+metricHelp+",bucketlimit=<"+le+">)")
continue
}
counter, err := strconv.ParseUint(strings.TrimSpace(row[field]), 10, 64)
if err != nil {
level.Error(e.logger).Log("Unable to convert ", field, " value to int (metric="+metric+
level.Error(e.logger).Log("msg", "Unable to convert ", field, " value to int (metric="+metric+
",metricHelp="+metricHelp+",value=<"+row[field]+">)")
continue
}
@@ -512,14 +521,14 @@ func (e *Exporter) scrapeGenericValues(db *sql.DB, ch chan<- prometheus.Metric,
}
return nil
}
level.Debug(e.logger).Log("Calling function GeneratePrometheusMetrics()")
level.Debug(e.logger).Log("msg", "Calling function GeneratePrometheusMetrics()")
err := e.generatePrometheusMetrics(db, genericParser, request)
level.Debug(e.logger).Log("ScrapeGenericValues() - metricsCount: ", metricsCount)
level.Debug(e.logger).Log("msg", "ScrapeGenericValues() - metricsCount: "+strconv.Itoa(metricsCount))
if err != nil {
return err
}
if !ignoreZeroResult && metricsCount == 0 {
return errors.New("No metrics found while parsing")
return errors.New("no metrics found while parsing, query returned no rows")
}
return err
}
15 changes: 7 additions & 8 deletions collector/default_metrics.go
Original file line number Diff line number Diff line change
@@ -50,13 +50,11 @@ context = "wait_time"
metricsdesc = { value="Generic counter metric from v$waitclassmetric view in Oracle." }
fieldtoappend= "wait_class"
request = '''
SELECT
n.wait_class as WAIT_CLASS,
round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE
FROM
v$waitclassmetric m, v$system_wait_class n
WHERE
m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'
SELECT wait_class as WAIT_CLASS, sum(time_waited) as VALUE
FROM gv$active_session_history
where wait_class is not null
and sample_time > sysdate - interval '1' hour
GROUP BY wait_class;
'''

[[metric]]
@@ -82,7 +80,8 @@ func (e *Exporter) DefaultMetrics() Metrics {
var metricsToScrape Metrics
if e.config.DefaultMetricsFile != "" {
if _, err := toml.DecodeFile(filepath.Clean(e.config.DefaultMetricsFile), &metricsToScrape); err != nil {
level.Error(e.logger).Log(fmt.Sprintf("there was an issue while loading specified default metrics file at: "+e.config.DefaultMetricsFile+", proceeding to run with default metrics."), err)
level.Error(e.logger).Log("msg", fmt.Sprintf("there was an issue while loading specified default metrics file at: "+e.config.DefaultMetricsFile+", proceeding to run with default metrics."),
"error", err)
}
return metricsToScrape
}
18 changes: 10 additions & 8 deletions default-metrics.toml
Original file line number Diff line number Diff line change
@@ -8,7 +8,10 @@ request = "SELECT status, type, COUNT(*) as value FROM v$session GROUP BY status
context = "resource"
labels = [ "resource_name" ]
metricsdesc = { current_utilization= "Generic counter metric from v$resource_limit view in Oracle (current value).", limit_value="Generic counter metric from v$resource_limit view in Oracle (UNLIMITED: -1)." }
request="SELECT resource_name,current_utilization,CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value FROM v$resource_limit"
request = '''
SELECT resource_name, current_utilization, CASE WHEN TRIM(limit_value) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(limit_value) END as limit_value
FROM v$resource_limit
'''

[[metric]]
context = "asm_diskgroup"
@@ -33,14 +36,13 @@ context = "wait_time"
metricsdesc = { value="Generic counter metric from v$waitclassmetric view in Oracle." }
fieldtoappend= "wait_class"
request = '''
SELECT
n.wait_class as WAIT_CLASS,
round(m.time_waited/m.INTSIZE_CSEC,3) as VALUE
FROM
v$waitclassmetric m, v$system_wait_class n
WHERE
m.wait_class_id=n.wait_class_id AND n.wait_class != 'Idle'
SELECT wait_class as WAIT_CLASS, sum(time_waited) as VALUE
FROM gv$active_session_history
where wait_class is not null
and sample_time > sysdate - interval '1' hour
GROUP BY wait_class
'''
ignorezeroresult = true

[[metric]]
context = "tablespace"
2 changes: 1 addition & 1 deletion docker-compose/compose.yaml
Original file line number Diff line number Diff line change
@@ -43,7 +43,7 @@ services:
start_period: 30s

exporter:
image: container-registry.oracle.com/database/observability-exporter:1.0.0
image: container-registry.oracle.com/database/observability-exporter:1.0.1
container_name: exporter
ports:
- 9161:9161
Loading