OEP-68: Learning Content Identifiers#
OEP |
|
Title |
Learning Content Identifiers |
Last Modified |
2026-04-10 |
Authors |
Kyle McCormick <kyle@axim.org> |
Arbiter |
Sarina Canelake <sarina@axim.org> |
Status |
Accepted |
Type |
Best Practice |
Created |
2026-03-28 |
Review Period |
2026-03-28 - 2026-04-15 |
Resolution |
Abstract#
Open edX code has four kinds of identifiers for learning content: Integer Primary Keys, Codes, OpaqueKeys, and UUIDs. Choosing the wrong kind—or naming it ambiguously—can cause subtle bugs, broken exports, and painful code reviews. This OEP explains what each kind is, when to reach for it, and how to name it consistently.
The primary audience is developers working on openedx-core, openedx-platform, and other repositories that manage learning content. Plugin developers, site administrators, and nontechnical Open edX Core Contributors who work with learning content will also find these concepts useful.
Motivation#
Identifiers show up everywhere: Python variables, function parameters, Django model fields, database columns, REST API fields, event data schemas. Using the right identifier for the right job matters. Here are some ways things can go wrong when the wrong kind is used:
Instance-specific identifiers break exports. Integer primary keys are assigned by a specific database and are meaningless anywhere else. For example, if exported course content references the primary key of a learner team (rather than the team’s slug), those content-team relationships become invalid the moment the content is imported into a different instance.
Context-specific identifiers break reuse. When copying a component and its media files from course run X to course run Y, the transfer format must describe those relationships without mentioning X or Y by name. Otherwise the copied component in Y may erroneously try to reference media files from course run X.
Mixing version-aware and version-agnostic identifiers breaks lookups. If learner-facing code queries a cache using a version-agnostic key, but cache entries were stored under version-specific keys, the cache will never be hit.
String foreign keys make databases slow. Long strings make indexes unnecessarily large. The table joining components to media files uses integer primary keys internally—only at export time are those keys resolved into portable strings.
Consistent naming also makes code review easier. Type annotations help, but a lot of
Open edX code operates in contexts where types aren’t checked: REST APIs, frontend JavaScript,
tracking events, legacy Python modules, and log output. When a variable is named collection_key
you immediately know it holds an OpaqueKey; when it’s named collection_id you know it’s a
database integer. This OEP aims to:
Help developers make informed decisions about which kind of identifier to use.
Give every identifier a name that makes its kind obvious—to reviewers, future developers, site admins, data researchers, and power users—even when no type annotations are present.
Specification#
Open edX code uses four categories of learning content identifier. When working with an identifier, figure out which category it belongs to, decide whether it’s the right fit for the job, and then apply the naming convention below. Use your judgement when adopting conventions in existing code—backwards compatibility and consistency with surrounding code come first.
These conventions apply everywhere learning content is referenced: Python variables, JS variables, Django model field names, REST API arguments, event schema fields, admin interfaces, application logs, and so on.
Summary#
Category |
What it is |
Python type |
Naming convention |
Storage |
|---|---|---|---|---|
Integer Primary Key |
Auto-incremented integer row identifier |
|
|
|
Code |
Locally-scoped slug-like string |
|
|
|
OpaqueKey |
Codes composed together into a semi-readable instance-wide identifier |
subclass of |
|
|
UUID |
Globally unique identifier scoped across all Open edX instances |
|
|
|
Integer Primary Keys#
Every Open edX Django model should declare an auto-incrementing integer primary key.
When to use: Primary keys are the default way to reference a database row within a single deployed service on a single instance. Always use them for Django model foreign key relationships—as integers, they can be indexed with almost no overhead, making lookups, joins, and constraint enforcement as fast as possible. The trade-off is that primary keys are meaningless outside the database that assigned them.
How to name: Use id and the _id suffix (e.g. collection_id) everywhere:
Django model fields, variable names, REST APIs, event schemas, and so on. When accessing the
primary key on a Django model instance, use .id (e.g. collection.id).
@dataclass
class TeamMembershipInfo:
user_id: int
user_fullname: str
team_id: int
team_name: str
def get_memberships(user_id: int) -> Iterable[TeamMembershipInfo]: ...
How to store: By default, use django.db.models.BigAutoField for all primary keys.
In rare cases—when a model has very few rows (like an enumeration) or receives a massive
number of foreign key references—use the smaller django.db.models.AutoField instead.
Codes#
A code is a short, slug-like string that identifies something within a specific enclosing context. Codes typically contain alphanumeric characters, hyphens, underscores, and periods, and are generally case-sensitive. A code alone is not globally unique—its meaning depends on the scope it lives in. Multiple codes can be combined to identify something in a broader scope.
For example:
A
block_codedistinguishes one XBlock usage from others of the same type within a learning context.A
block_codeandtype_codetogether uniquely identify an XBlock usage among all blocks in a learning context.An
org_code,course_code,run_code,block_code, andtype_codetogether uniquely identify an XBlock usage among all blocks on an Open edX instance.
When to use: Codes are the right choice when you want an identifier that is descriptive
but deliberately limited in scope. This makes them ideal for transfer formats, where
leaving information out is just as important as including it. For example, OLX links blocks
together using block_code and type_code, but intentionally omits
org_code/course_code/run_code—so the OLX can be imported into any instance, course, or
library without modification.
How to name: Variables and fields holding a code should use the suffix _code.
Historically, codes have been called “slugs” or “shortnames”, and existing code may use
suffixes like _slug, _id, or no suffix at all (e.g. org, run,
block_type). The suffix _code is preferred for new code.
How to store: By default, store codes in a case-sensitive CharField of length 255
with a regex validator. A factory function is available at openedx_django_lib.fields.code_field.
OpaqueKeys#
An OpaqueKey (defined in openedx/opaque-keys) is an immutable Python object that
bundles a “key type” and one or more codes to uniquely identify a resource within an Open edX
instance. Think of it as a structured, semi-human-readable instance-scoped identifier.
OpaqueKey is an abstract base class organized into a hierarchy: abstract intermediate
classes represent broad concepts (like LearningContextKey and UsageKey), while concrete
subclasses represent specific resource types (like CourseLocator and LibraryUsageLocatorV2).
Each concrete type serializes to a predictable string format.
A note on cross-instance usage: OpaqueKeys are designed for use within a single instance,
but the same OpaqueKey can naturally appear in multiple instances. For example, if you export a
course with org code Axim, course code Chem101, and run code Spring2026 from one
instance and import it elsewhere using the same codes, all the blocks will have identical
OpaqueKeys. This is useful: external tools like catalog integrations, sync workflows, and
reporting scripts can use OpaqueKeys to correlate data across instances.
However, OpaqueKeys have limits. Content can diverge after import, learner data stays instance-specific, and Open edX doesn’t enforce any global meaning to OpaqueKeys across boundaries. If you need an identifier that is truly unique across all instances, use UUIDs instead.
For example:
"course-v1:Axim+Chem101+Spring2026"contains:course-v1(key type: “course run”)Axim(the org code)Chem101(the course code)Spring2026(the run code)
"lb:Axim:ChemLib:problem:Atoms6"contains:lb(key type: “library block”)Axim(the org code)ChemLib(the library code)problem(the type code)Atoms6(the component/block code)
When to use: Both integer primary keys and OpaqueKeys uniquely identify a resource within an instance, so when do you choose one over the other?
Use primary keys for database relationships. They’re far more efficient for joins, lookups, and constraint enforcement.
Use OpaqueKeys where quasi-readability matters. OpaqueKey strings carry semantic information, making them a better fit for URLs, error logs, event data, and admin or power-user UIs.
Prefer fewer internal OpaqueKey references for flexibility. Because OpaqueKeys embed human-readable codes, a site admin might want to rename a course and change its run code. This is rare, but keeping internal OpaqueKey references to a minimum makes the platform more adaptable to these operations in the future.
How to name:
Python variables and attributes holding a parsed
OpaqueKeyobject →_keysuffix.Django Model Fields and Serializer Fields that convert between OpaqueKey objects and strings →
_keysuffix.REST APIs, event data fields, and other external representations →
_keysuffix.When parsed objects and raw strings co-exist in the same context (e.g. a parsing function) → use
_key_strfor the raw string to disambiguate.Frontend variables →
*Keysuffix.*KeyStringis not needed because parsed OpaqueKey objects don’t exist on the frontend.
# Preferred
course_key: CourseKey = CourseKey.from_string("course-v1:Axim+Demo+2026")
usage_key: UsageKey = ...
collection_key: LibraryCollectionLocator = ...
def get_course(course_key: CourseKey) -> CourseOverview: ...
Prefer passing parsed OpaqueKey objects over raw strings whenever possible—they’re
type-safe and keep all parsing logic in one place. Use the specific OpaqueKey subclass
as a type annotation wherever it’s known.
How to store: The best way to store an OpaqueKey is an OpaqueKeyField subclass such
as UsageKeyField, providing automatic marshalling between OpaqueKey objects and their
string representations in SQL VARCHARs.
💡 Historical note: Concrete OpaqueKey subclasses use the suffix Locator instead of
Key for historical reasons. This distinction can be ignored by consumers—they’re all
_keys. We plan to rename all *Locator classes to *Key in the future.
⚠️ Historical hazard: Older types like CourseLocator can represent both
version-aware and version-agnostic identifiers. This has led to all sorts of bugs and is
widely considered a mistake. New OpaqueKey classes should refer to exactly one kind of
resource—version information must be either always present or always absent, never optional.
UUIDs#
A UUID (Universally Unique Identifier) is a 128-bit identifier that is unique across all Open edX instances, not just one. Unlike primary keys and OpaqueKeys, UUIDs have no dependency on any particular database or deployment.
When to use: Use UUIDs when an object needs an identity that is stable and globally unique across every Open edX instance, allowing the objects to be shared outside and across instances without risk of collision. For example:
Learner certificates awarded on different Open edX instances should have distinct UUIDs, even if their instance-local identifiers (course run key, user primary key) are identical.
Changelog entries should have distinct UUIDs, even if the changes are identical.
How to name:
Python variables and attributes holding a parsed
uuid.UUIDobject →_uuidsuffix.Django Model Fields and Serializer Fields that convert between UUID objects and strings →
_uuidsuffix.REST APIs, event data fields, and other external representations →
_uuidsuffix.When parsed objects and raw strings co-exist in the same context → use
_uuid_strfor the raw string to disambiguate.Frontend variables →
*Uuidsuffix.*UuidStringis not needed because parsed UUID objects are not used in Open edX frontend code.
How to store: The best way to store a UUID is a UUIDField.
Not every object needs a universal identity. Consider the need before defining a UUDIField.
Other Identifiers#
Not every identifier fits neatly into one of the four categories above. When that happens,
choose a name that avoids the reserved suffixes _id, _key, _key_str,
_code, _uuid, and _uuid_str. Using a different name signals clearly to
readers that this identifier has its own semantics and shouldn’t be treated as one of the
standard types.
Examples:
refnameonPublishableEntityinopenedx-core— this field correlates a database entity with its representation in off-platform content archives. It isn’t a primary key (those are database-specific), a code (it may contain non-slug characters), or an OpaqueKey (it can’t be parsed into an instance-scoped identifier). The namerefnamesignals that it’s something distinct.version_num— an integer used as part of the identity of several version-aware content models. It resembles a Code conceptually (it identifies a version within a versioned entity), but it’s an integer rather than a string, so_codewould be misleading.BlockRef— a 2-tuple of(type_code, block_code)that locally identifies a block usage within a learning context. Historically this was calledBlockKey, which proved confusing becauseblock_keyis also used forUsageKeys, which identify a block across an entire instance.
Consequences#
Start: New conventions#
_codefor codes (and, the term “code” in general)BlockRefandblock_reffor 2-tuples of(type_code, block_code)
Stop: Old patterns to drop#
_idfor OpaqueKeys (e.g.course_id)_idfor codes (e.g.,block_id)_keyfor codes (e.g.,collection_key)_keyfor non-OpaqueKey ref strings (e.g.PublishableEntity.key)BlockKeyandblock_keyIn OpaqueKeys,
*Locatorclasses will be renamed to*Key
Continue: Already widely adopted#
idand_idfor integer primary keys._keyfor OpaqueKey objects._uuidfor UUID objects._key_strand_uuid_strfor stringified OpaqueKeys and UUIDs.
Migration plan#
The guidance above applies immediately to new code.
Start retroactively applying guidance in
openedx-core, which has few references to update.Move on to
opaque-keys, probably after Verawood.Eventually, time permitting, consider updating existing variables and renaming model fields in
openedx-platform.Whenever renaming classes, keep old names as aliases to new ones.
Whenever renaming fields, use
@propertyto make readonly backcompat aliases.
Rejected Alternatives#
pk and _pk for integer primary keys#
An earlier draft of this OEP recommended _pk for integer primary key variables and
.pk for accessing the primary key on a Django model instance. The argument was that
_pk leaves no doubt that the value is a database integer, whereas _id is overloaded.
That recommendation was dropped in favor of id and _id for two reasons:
Static typing of primary keys. A planned improvement to this OEP adds static typing to primary keys—for example,
Component.IDandCourseRun.IDas sub-types ofint. These types can be specified on the.idmodel field, but a django-stubs limitation means.pkcan only have typeAny. The ability to statically type.idoutweighs the explicitness of.pk.Less code churn. Most existing Open edX code already uses
.idand_idfor primary keys, so adopting_idas the standard means fewer code changes when implementing this OEP’s guidelines.
Change History#
2026-04-10#
Replace
_pknaming recommendation with_idfor integer primary keysAdd Rejected Alternatives section
2026-03-23#
Initial proposal