Resource and Entities

Modelling Guide

To define an entity, create a new semantic convention model file, with a group type as entity, for example:

model/{my_domain}/entities.yaml:

groups:
  - id: entity.my_entity
    type: entity
    stability: development
    name: my_entity
    brief: >
      A description of my_entity here.
    attributes:
      - ref: some.attribute
        role: identifying
      - ref: some.other_attribute
        role: descriptive
        ...

Here, the attributes field contains all attributes of the Entity. The role of each attribute determines if it is identifying or descriptive. See How to define identifying attributes? for details on what these mean.

Note: Declaring Entity relationships is not yet supported.

Declaring associations between signals

You can declare which entities should be used with specific observability signals. For example, process metrics should be used with the process entity, so that the metric is associated with a known process. To declare this, use the entity_associations field on the signal and reference another resource group by name.

model/{my_domain}/metrics.yaml:

groups:
  - id: metric.some_metric
    type: metric
    ...
    entity_associations: 
      - my_entity

Notes:

You cannot declare an association on an unstable resource from a stable signal.
You can declare multiple associations. These form a “one or many” set, where one or many of the named entities may be associated with the metric. There is no requirement to have one and only one entity attached to a signal.

Extending an entity

While not recommended for Semantic Conventions, you can define a new “view” of an entity that includes additional descriptive attributes. To do so, use the extends field on groups:

model/{my_other_domain}/entities.yaml:

groups:
  - id: entity.my_entity_2
    type: entity
    extends: entity.my_entity
    attributes:
      - ref: new.attribute.name
        requirement_level: opt_in
        role: descriptive

Notes:

You cannot change the name or type fields of the new entity.
You cannot change the set of identifying attributes.

FAQ

When to define a new entity?

There are two scenarios where entities should be defined:

When you are generating a new signal (log, metric, span, etc.) and no existing entity makes sense as the “source”.
(future) When you need to describe an entity hierarchy from some system of record (e.g. resources in kubernetes, assets in a cloud).

For example, if a new clustering solution (e.g. Hashicorp’s Nomad) is defined, and existing container-based entities are not enough, then new entities should be defined.

What is an “is-a” relationship?

OpenTelemetry, as an open ecosystem, cannot understand and model all possible entities that exist in the world. Instead, we are allowing overlapping definitions across domains. For example, the container and k8s.container entities exist, and generally every k8s.container is a container, but not every container runs in kubernetes.

An “is-a” relationship denotes that one entity is describing the exact same system component as another entity, but from a different domain. In the above example, k8s.container models containers from the kubernetes domain, while container is a general model for containers, regardless of how they are run (e.g. podman, docker, kubernetes, FAAS, etc.)

“is-a” relationships denote this relationship in entities allowing OpenTelemetry to fully model a subset of entities (e.g. all known k8s resources as entities), but still allow the extended ecosystem to grow and evolve with new entities in the future.

When to define an “is-a” relationship vs. extending descriptive attributes?

There are two key rules:

Default to introducing separate entities with a clear “is a” (or similar) relationship
Extend an entity with new descriptive attributes if, and only if, the following is true:
- The extending entity cannot be associated with any telemetry by itself.
  - Example 1: When adding a windows.process entity, you do not expect to create any specific process metrics or logs that would be specific to windows.process, instead all data is still reported against process entity.
  - Example 2: When adding a docker.container entity, you do not expect to create any specific container signals, instead logs, metrics and spans would be reported against the container entity.
- The extending entity doesn’t have any other entities that logically can only be associated with it. Note: this only applies to entities as signals with relationships.

This avoids the complexities of subtyping and ambiguous attribute usage.

How to define identifying attributes?

The identifying attributes should be minimally sufficient to identify an entity within the context of how that entity is discovered. For example, when discovering kubernetes entities like k8s.pod, k8s.deployment, the identifying attributes should be sufficient to identify these entities within the scope of a kubernetes cluster (or more specifically, the kubernetes API server where the entities are discovered).

Commonly, a number of attributes of an entity are readily available for the telemetry producer to compose an identity. Of the available attributes the entity ID should include the minimal set of attributes that is sufficient for uniquely identifying that entity. For example a process on a host can be uniquely identified by (process.pid,process.creation.time) attributes. Adding for example process.executable.name attribute to the identity is unnecessary and violates the rule of having a minimally sufficient ID.

Identifying attributes generally form the lifespan of an entity. This is important, particularly, for metrics written against an entity. The lifespan determines whether a timeseries remains “connected” between reported points, or if it suddenly looks like a drop. It is recommended to select an identity that keeps the lifespan “stable” for important alerting and monitoring use cases.

Identifying attributes MUST NOT change during the lifespan of the entity.

Multi-Observer Guidance

When choosing Identifying attributes, care should be taken to ensure that multiple observers will find the same identifying attribute for the same entity. Generally, entities may be discovered both within OpenTelemetry SDKs and the Collector and should leverage identifying attributes that will be the same between these signal providers.

For example, service.instance.id can be problematic to detect from outside an SDK and inside an SDK consistently. Generally, this can only be achieved if some outside source injects a service.instance.id value into the SDK that is externally visible. An alternative is to have the SDK provide a relationship between the service.instance.id and another entity that is visible externally. Care should be taken when modelling Entities to avoid this problem where possible.

The choice of service.instance.id should be an exception, not the rule, for most Entities being modelled. Service instancing is a fundamental feature of OpenTelemetry, and we think it is a critical “fall back” identity. It works best when there is one generator of the id shared across all observers. However, in practice, this is difficult or “non standard” in the following scenarios:

Prometheus pull metrics that want the instance label to match service.instance.id on push based OTLP data.
Reading container logs from a k8s.node, where we know the container name and deployment, but can’t see into the SDK to understand a chosen instance id.

The OpenTelemetry Operator, and onboarding guides for kubernetes, e.g. leverage mechanisms to ensure a service.instance.id can be pushed down to SDKs and external observers, alleviating this friction for kubernetes.

How to namespace entities?

Entities (both types and and attributes) should be namespaced around the primary mechanism used to identify the Entity. For Example, kubernetes entities use the k8s namespace, and are primarily discovered using the kubernetes API or working within kubernetes.

See General Naming Guidance for overall semantic convention namespacing rules.

Background: Resource and Entities

In OpenTelemetry, every signal is associated with a Resource. According to the Specification this is:

A Resource is a representation of the entity producing telemetry. Within OpenTelemetry, all signals are associated with a Resource, enabling contextual correlation of data from the same source. For example, if I see a high latency in a span I need to check the metrics for the same entity that produced that Span during the time when the latency was observed.
Resource provides two important aspects for observability:
It MUST identify an entity that is producing telemetry.
It SHOULD allow users to determine where that entity resides within their infrastructure.

All resources are composed of Entities. An entity is specified as:

Entity represents an object of interest associated with produced telemetry: traces, metrics, logs, profiles etc.

While there is overlap in the definition of Entity and Resource, there are several key differences between the two:

An Entity has a known “type”, e.g. service, k8s.pod, host, etc.
An Entity can distinguish identifying attributes from descriptive attributes.
- Identifying attributes can be used to identify the entity within some system (See minimally sufficient id). For Example, the k8s.pod.uid would be considered an identifying attribute for a pod within kubernetes.
- Descriptive attributes can be used to provide additional labels for entities, but are not necessary to uniquely identify the Entity.
A Resource is composed of multiple Entities.
- Each of the entities within Resource is considered ‘contributing’ to that telemetry.
- For Example, today, most SDKs include the service entity, but also another entity, like k8s.container, host, etc.
An Entity may be conceptually similar to another (which we call an “is-a” relationship).
- For example, the k8s.cluster entity generically represents kubernetes clusters, while the aws.eks.cluster entity would represent the AWS specific concept of an Elastic Kubernetes cluster.
- In this case, a Resource from EKS could contain both the aws.eks.cluster entity and the k8s.cluster entity.

There are two key principles that are important for Entities and Resource in OpenTelemetry:

Open expansion: Allowing users outside of OpenTelemetry to provide Entity definitions and relationships within the system.
Telescoping Identity: Allowing flexible denormalization of observability data to optimise critical queries (e.g. alerts, dashboard, etc.)

Open Expansion

OpenTelemetry is designed to be an open system. When it comes to defining the core set of entities and relationships within systems, it needs to remain open about what these entities and possible relationships are. Any system a user has should be able to model and participate with existing OpenTelemetry semantic conventions. This is done through two key aspects:

Namespacing
“Is-a” Relationships

When defining a new set of entities within OpenTelemetry Semantic Conventions, they should be namespaced, as per the Semantic Convention naming policy. This gives clear indication which concepts are clearly related with each other. For example, the k8s namespace would define kubernetes related entities and their relationships. Users would know to create a new namespace when modelling concepts on top of k8s.

Expansion to existing concepts is done through “is-a” relationships. These are relationships where one entity is known to represent the same concept as another entity, but in some new scoped context. For example, an aws.eks.cluster is a k8s.cluster, but not all k8s.cluster entities are aws.eks.cluster entities.

Telescoping Identity

Within OpenTelemetry, we want to give users the flexibility to decide what information needs to be sent with observability signals and what information can be later joined. We call this “telescoping identity” where users can decide how small or large the size of an OpenTelemetry resource will be on the wire (and correspondingly, how large data points may be when stored, depending on storage solution).

For example, in the extreme, OpenTelemery could synthesize a UUID for every system which produces telemetry. All identifying attributes for Resource and Entity could be sent via a side channel with known relationships to this UUID. While this would optimise the runtime generation and sending of telemetry, it comes at the cost of downstream storage systems needing to join data back together either at ingestion time or query time. For high performance use cases, e.g. alerting, these joins can be expensive.

In practice, users control Resource identity via the configuration of Resource Detection within SDKs and the collector. Users wishing for minimal identity may limit their resource detection just to a service.instance.id. Some users highly customize resource detection with many concepts being appended.

OpenTelemetry should provide a good “out of the box” set of resource detection that makes appropriate denormalization trade-offs for most users, but allows users to fine-tune the system to their needs.

Feedback

Was this page helpful?

Thank you. Your feedback is appreciated!

Please let us know how we can improve this page. Your feedback is appreciated!