Skip to content

Schemas

Pydantic boundary objects that define clean data flow boundaries between pipelines.

Enums

enums

Shared enumerations for boundary object schemas.

All enums are StrEnum for JSON-friendly serialization, ensuring that values survive round-trips through JSON/YAML without requiring custom serializers. Domain-extensible enums (EntityTypeEnum, RelationshipTypeEnum, PermissionTypeEnum) can be extended via domain overlay YAML in a future phase.

This module provides the single source of truth for all categorical values used across the five-pipeline architecture (ETL, Entity Resolution, Attribution Engine, API/MCP Server, Chat Interface). Enums are grouped by domain:

  • Core pipeline -- source identification, entity types, resolution
  • Attribution -- credit roles, assurance levels, provenance events
  • Permissions -- MCP Permission Patchbay types, values, scopes
  • Uncertainty -- uncertainty decomposition taxonomy, calibration
  • Commercial landscape -- training attribution, watermarking, revenue
  • Regulatory/compliance -- EU AI Act, ISO 42001, DSM Directive
See Also

Teikari, P. (2026). Music Attribution with Transparent Confidence. SSRN No. 6109087 -- sections 5-7 for assurance levels and permission patchbay design.

SourceEnum

Bases: StrEnum

Data source identifiers for the ETL pipeline.

Each value represents an external data source from which music metadata is ingested. The source identity is preserved throughout the entire pipeline so that downstream confidence scoring can apply per-source reliability weights. See Teikari (2026), section 5.

ATTRIBUTE DESCRIPTION
MUSICBRAINZ

MusicBrainz open database. Community-curated, high coverage for Western popular music. Provides MBIDs.

TYPE: str

DISCOGS

Discogs marketplace/database. Strong on vinyl releases, credits, and label information. Provides Discogs numeric IDs.

TYPE: str

ACOUSTID

AcoustID audio fingerprint service. Matches audio signals to MusicBrainz recordings via Chromaprint fingerprints.

TYPE: str

ARTIST_INPUT

Direct input from the artist or their representative. Highest authority for creative intent, but unverifiable by third parties.

TYPE: str

FILE_METADATA

Embedded file metadata (ID3 tags, Vorbis comments, etc.) extracted from the audio file itself. Quality varies widely.

TYPE: str

Examples:

>>> SourceEnum.MUSICBRAINZ
<SourceEnum.MUSICBRAINZ: 'MUSICBRAINZ'>
>>> SourceEnum("ARTIST_INPUT")
<SourceEnum.ARTIST_INPUT: 'ARTIST_INPUT'>

EntityTypeEnum

Bases: StrEnum

Music entity types within the knowledge graph.

Follows the MusicBrainz entity model with extensions for credits. Entity resolution produces a unified graph where each node is typed by one of these values.

ATTRIBUTE DESCRIPTION
RECORDING

A specific audio recording (a unique performance captured in a studio or live). Identified by ISRC.

TYPE: str

WORK

An abstract musical composition independent of any recording. Identified by ISWC. A single work may have many recordings.

TYPE: str

ARTIST

A person or group who creates or performs music. Identified by ISNI or IPI.

TYPE: str

RELEASE

A packaged product (album, single, EP) containing one or more recordings.

TYPE: str

LABEL

A record label or publishing entity that owns or distributes releases.

TYPE: str

CREDIT

A specific attribution credit linking an artist to a recording or work in a particular role (e.g., producer, songwriter).

TYPE: str

Examples:

>>> EntityTypeEnum.RECORDING
<EntityTypeEnum.RECORDING: 'RECORDING'>

RelationshipTypeEnum

Bases: StrEnum

Relationship types between music entities in the knowledge graph.

Edges in the entity graph are typed by these values. Each relationship connects a source entity (typically an artist) to a target entity (typically a recording or work). Relationship types map loosely to CreditRoleEnum but represent graph edges rather than attribution line items.

ATTRIBUTE DESCRIPTION
PERFORMED

Artist performed on a recording (vocalist, instrumentalist).

TYPE: str

WROTE

Artist wrote the underlying composition (songwriter, composer).

TYPE: str

PRODUCED

Artist produced the recording (creative/technical oversight).

TYPE: str

ENGINEERED

Artist served as recording engineer.

TYPE: str

ARRANGED

Artist arranged the composition for a specific performance.

TYPE: str

MASTERED

Artist mastered the final audio (mastering engineer).

TYPE: str

MIXED

Artist mixed the recording (mixing engineer).

TYPE: str

FEATURED

Artist is a featured guest on the recording.

TYPE: str

SAMPLED

Recording contains a sample from the target recording.

TYPE: str

REMIXED

Recording is a remix of the target recording.

TYPE: str

ResolutionMethodEnum

Bases: StrEnum

Entity resolution methods used to merge NormalizedRecords.

The Entity Resolution pipeline may use one or more of these methods to determine whether two NormalizedRecords refer to the same real-world entity. Methods are ordered roughly by computational cost and reliability. The chosen method is recorded on each ResolvedEntity for provenance tracing.

ATTRIBUTE DESCRIPTION
EXACT_ID

Exact match on a standard identifier (ISRC, ISWC, ISNI, MBID). Highest confidence, lowest cost.

TYPE: str

FUZZY_STRING

Fuzzy string matching on names/titles (e.g., Levenshtein, Jaro-Winkler). Handles typos and transliterations.

TYPE: str

EMBEDDING

Semantic similarity via vector embeddings (e.g., sentence transformers). Handles paraphrases and abbreviations.

TYPE: str

GRAPH

Graph-based resolution using relationship structure (e.g., Splink). Exploits co-occurrence patterns in the entity graph.

TYPE: str

LLM

LLM-assisted resolution for ambiguous cases. Most expensive, used as a fallback when other methods are inconclusive.

TYPE: str

MANUAL

Human-in-the-loop resolution by a domain expert. Triggered when automated methods produce low-confidence matches.

TYPE: str

AssuranceLevelEnum

Bases: StrEnum

Tiered provenance classification (A0--A3).

Maps to the assurance framework from Teikari (2026), section 6. Higher levels require stronger evidence chains. The assurance level determines how much trust downstream consumers can place in an attribution record.

Ordered by verification depth: LEVEL_0 < LEVEL_1 < LEVEL_2 < LEVEL_3.

ATTRIBUTE DESCRIPTION
LEVEL_0

No provenance data. Self-declared or unknown origin. Corresponds to A0 in the manuscript.

TYPE: str

LEVEL_1

Single source. Documented but not independently verified. Corresponds to A1. Typical for file-metadata-only records.

TYPE: str

LEVEL_2

Multiple sources agree. Cross-referenced and corroborated across at least two independent data sources. Corresponds to A2.

TYPE: str

LEVEL_3

Artist-verified or authority-verified. Highest assurance level. Requires explicit confirmation from the rights holder or an authoritative registry (e.g., ISNI). Corresponds to A3.

TYPE: str

Examples:

>>> AssuranceLevelEnum.LEVEL_3
<AssuranceLevelEnum.LEVEL_3: 'LEVEL_3'>
>>> AssuranceLevelEnum("LEVEL_0") < AssuranceLevelEnum("LEVEL_3")
True

ConflictSeverityEnum

Bases: StrEnum

Conflict severity levels between data sources.

When entity resolution encounters disagreements between sources for the same field (e.g., different release dates or artist names), the conflict is assigned a severity level that determines whether it can be auto-resolved or requires human review.

ATTRIBUTE DESCRIPTION
LOW

Minor discrepancy, auto-resolvable. Example: trailing whitespace differences in artist names.

TYPE: str

MEDIUM

Significant discrepancy requiring attention but not blocking. Example: different release dates within the same year.

TYPE: str

HIGH

Major disagreement likely indicating a data quality issue. Example: different artist names for the same ISRC.

TYPE: str

CRITICAL

Fundamental conflict that blocks attribution. Example: contradictory songwriter credits from authoritative sources. Always triggers needs_review = True.

TYPE: str

CreditRoleEnum

Bases: StrEnum

Credit roles for music attribution.

These roles appear in Credit objects within AttributionRecord and as prediction targets in conformal prediction sets. The taxonomy covers the most common roles found across MusicBrainz, Discogs, and industry metadata standards (DDEX, CWR).

ATTRIBUTE DESCRIPTION
PERFORMER

Primary performer (vocalist or lead instrumentalist).

TYPE: str

SONGWRITER

Songwriter (both music and lyrics). Use COMPOSER or LYRICIST for more specific roles.

TYPE: str

COMPOSER

Composed the music (melody, harmony, structure).

TYPE: str

LYRICIST

Wrote the lyrics/text.

TYPE: str

PRODUCER

Music producer (creative and/or technical oversight of the recording process).

TYPE: str

ENGINEER

Recording/tracking engineer.

TYPE: str

MIXING_ENGINEER

Mixing engineer (balance, EQ, effects in post-production).

TYPE: str

MASTERING_ENGINEER

Mastering engineer (final audio processing for distribution).

TYPE: str

ARRANGER

Arranged the composition for a specific performance context.

TYPE: str

SESSION_MUSICIAN

Session musician (hired instrumentalist, not a band member).

TYPE: str

FEATURED_ARTIST

Featured guest artist on the recording.

TYPE: str

CONDUCTOR

Orchestra or ensemble conductor.

TYPE: str

DJ

DJ (for electronic music, turntablism, or mix compilations).

TYPE: str

REMIXER

Created a remix of the original recording.

TYPE: str

ProvenanceEventTypeEnum

Bases: StrEnum

Provenance event types for the attribution audit trail.

Every ProvenanceEvent in an AttributionRecord is typed by one of these values. Together they form an immutable audit chain showing how an attribution was constructed and refined over time.

ATTRIBUTE DESCRIPTION
FETCH

Data fetched from an external source (ETL pipeline). Records which source was queried and how many records were returned.

TYPE: str

RESOLVE

Entity resolution step. Records the method used and the input/output record counts.

TYPE: str

SCORE

Confidence scoring/calibration step. Records the previous and new confidence values and the scoring method applied.

TYPE: str

REVIEW

Human review event. Records who reviewed, which feedback card was applied, and how many corrections were made.

TYPE: str

UPDATE

Record update event. Records version bump, fields changed, and what triggered the update.

TYPE: str

FEEDBACK

Feedback integration event. Records that a FeedbackCard was processed and whether its corrections were accepted.

TYPE: str

ReviewerRoleEnum

Bases: StrEnum

Feedback reviewer roles for the FeedbackCard system.

Identifies the domain expertise of the person providing feedback. The reviewer role affects how feedback is weighted during calibration updates -- artist-provided corrections carry higher authority than fan suggestions.

ATTRIBUTE DESCRIPTION
ARTIST

The artist themselves (or a confirmed representative). Highest authority for creative intent.

TYPE: str

MANAGER

Artist manager or business representative. Authority for contractual and commercial metadata.

TYPE: str

MUSICOLOGIST

Academic musicologist or music information retrieval expert. Authority for compositional analysis and historical context.

TYPE: str

PRODUCER

Music producer who worked on the recording. Authority for session credits and technical contributions.

TYPE: str

FAN

Community member / fan contributor. Valuable for crowd-sourced corrections but requires corroboration. Lowest weight.

TYPE: str

EvidenceTypeEnum

Bases: StrEnum

Evidence types supporting feedback corrections.

When a reviewer submits a FeedbackCard with corrections, they must indicate what evidence supports the correction. Evidence type affects the credibility weighting of the correction during calibration updates.

ATTRIBUTE DESCRIPTION
LINER_NOTES

Physical or digital liner notes from the release packaging. Strong documentary evidence.

TYPE: str

MEMORY

Personal recollection of the reviewer (e.g., artist remembering who played on a session). Subject to recall bias.

TYPE: str

DOCUMENT

Contractual or legal document (e.g., publishing agreement, session contract). Strongest documentary evidence.

TYPE: str

SESSION_NOTES

Studio session notes or recording logs. Strong evidence for engineering and performance credits.

TYPE: str

OTHER

Other evidence type not covered above. Requires free-text explanation in the FeedbackCard.free_text field.

TYPE: str

PermissionTypeEnum

Bases: StrEnum

Permission types for the MCP Permission Patchbay.

Defines the universe of machine-readable permission queries that AI platforms and other consumers can issue via MCP. The taxonomy is hierarchical: AI_TRAINING is a broad category with AI_TRAINING_COMPOSITION, AI_TRAINING_RECORDING, and AI_TRAINING_STYLE as finer-grained sub-permissions.

See Teikari (2026), section 7, for the Permission Patchbay design.

ATTRIBUTE DESCRIPTION
STREAM

Permission to stream the recording.

TYPE: str

DOWNLOAD

Permission to download the recording for offline use.

TYPE: str

SYNC_LICENSE

Synchronisation license (music paired with visual media).

TYPE: str

AI_TRAINING

Broad permission for AI model training on any aspect of the work.

TYPE: str

AI_TRAINING_COMPOSITION

AI training specifically on the compositional elements (melody, harmony, structure).

TYPE: str

AI_TRAINING_RECORDING

AI training specifically on the recording (audio signal, mix, production qualities).

TYPE: str

AI_TRAINING_STYLE

AI training on stylistic elements (timbre, groove, aesthetic).

TYPE: str

DATASET_INCLUSION

Inclusion in a published research or training dataset.

TYPE: str

VOICE_CLONING

Use of vocal performance for voice cloning / synthesis.

TYPE: str

STYLE_LEARNING

Learning artistic style for generative imitation.

TYPE: str

LYRICS_IN_CHATBOTS

Reproduction of lyrics in chatbot / LLM responses.

TYPE: str

COVER_VERSIONS

Permission to create and distribute cover versions.

TYPE: str

REMIX

Permission to create remixes of the recording.

TYPE: str

SAMPLE

Permission to sample portions of the recording.

TYPE: str

DERIVATIVE_WORK

Broad permission for any derivative work not covered above.

TYPE: str

PermissionValueEnum

Bases: StrEnum

Permission response values for MCP consent queries.

Each permission entry in a PermissionBundle resolves to one of these values. The values form a spectrum from unconditional denial to unconditional allowance, with conditional variants in between.

ATTRIBUTE DESCRIPTION
ALLOW

Unconditional permission granted.

TYPE: str

DENY

Permission explicitly denied. No exceptions.

TYPE: str

ASK

Permission not pre-determined; the requester must contact the rights holder for case-by-case approval.

TYPE: str

ALLOW_WITH_ATTRIBUTION

Permission granted on condition that proper attribution is included. Requires PermissionEntry.attribution_requirement to specify the required attribution text.

TYPE: str

ALLOW_WITH_ROYALTY

Permission granted on condition of royalty payment. Requires PermissionEntry.royalty_rate to specify the rate.

TYPE: str

PermissionScopeEnum

Bases: StrEnum

Permission scope levels defining granularity of consent.

Permissions can be set at different levels of granularity, from an entire catalog down to a single work. Broader scopes act as defaults that can be overridden by narrower scopes.

ATTRIBUTE DESCRIPTION
CATALOG

Applies to the entire catalog of the rights holder. Broadest scope. When scope is CATALOG, scope_entity_id must be None.

TYPE: str

RELEASE

Applies to a specific release (album, EP, single). Requires scope_entity_id pointing to the release entity.

TYPE: str

RECORDING

Applies to a specific recording. Requires scope_entity_id pointing to the recording entity.

TYPE: str

WORK

Applies to a specific musical work (composition). Requires scope_entity_id pointing to the work entity.

TYPE: str

DelegationRoleEnum

Bases: StrEnum

Delegation chain roles in the permission hierarchy.

A PermissionBundle may include a delegation chain showing who granted permission authority to whom. This enables audit trails for permission provenance (e.g., artist -> manager -> label -> distributor).

ATTRIBUTE DESCRIPTION
OWNER

Original rights holder (typically the artist or songwriter). Root of the delegation chain.

TYPE: str

MANAGER

Artist manager or business representative acting on behalf of the owner.

TYPE: str

LABEL

Record label holding master recording rights via contract.

TYPE: str

DISTRIBUTOR

Digital distributor handling platform delivery. Typically the outermost link in the delegation chain.

TYPE: str

PipelineFeedbackTypeEnum

Bases: StrEnum

Pipeline feedback signal types for continuous improvement.

These are reverse-flow signals between pipelines, enabling the system to self-correct. For example, the Attribution Engine can signal back to Entity Resolution that its confidence estimates were miscalibrated, or the API layer can signal a dispute.

ATTRIBUTE DESCRIPTION
REFETCH

Signal from Entity Resolution to ETL: "data from source X is consistently wrong or stale, re-fetch from the source."

TYPE: str

RECALIBRATE

Signal from Attribution Engine to Entity Resolution: "resolution confidence was miscalibrated; predicted confidence differs significantly from actual accuracy."

TYPE: str

DISPUTE

Signal from API/Chat to Attribution Engine: "a user or rights holder has disputed this attribution; re-evaluate."

TYPE: str

STALE

Signal from any pipeline: "this record has not been refreshed within its expected freshness window."

TYPE: str

UncertaintySourceEnum

Bases: StrEnum

Uncertainty source taxonomy based on UProp (Duan 2025).

Classifies the origin of uncertainty in confidence estimates. The intrinsic/extrinsic decomposition (Duan 2025, arXiv:2506.17419) is the primary axis; aleatoric/epistemic is the classical secondary axis for compatibility with standard ML uncertainty literature.

ATTRIBUTE DESCRIPTION
INTRINSIC

Intrinsic uncertainty arising from noise in the input data itself (e.g., conflicting metadata across sources, ambiguous artist names).

TYPE: str

EXTRINSIC

Extrinsic uncertainty arising from the model or pipeline (e.g., embedding model limitations, resolution algorithm edge cases).

TYPE: str

ALEATORIC

Irreducible uncertainty inherent in the data-generating process. Cannot be reduced by collecting more data.

TYPE: str

EPISTEMIC

Reducible uncertainty due to limited knowledge or data. Can be reduced by collecting more evidence or better models.

TYPE: str

UncertaintyDimensionEnum

Bases: StrEnum

4-dimensional uncertainty framework (Liu 2025, arXiv:2503.15850).

Orthogonal to the intrinsic/extrinsic decomposition, this framework decomposes uncertainty along the information processing pipeline: from input through reasoning to final prediction.

ATTRIBUTE DESCRIPTION
INPUT

Uncertainty in the input data (noise, missing fields, ambiguity). Maps to StepUncertainty.input_uncertainty.

TYPE: str

REASONING

Uncertainty in the reasoning/inference process (e.g., entity resolution logic, LLM chain-of-thought). Maps to StepUncertainty.reasoning_uncertainty.

TYPE: str

PARAMETER

Uncertainty in model parameters (e.g., embedding model weights, fuzzy matching thresholds). Maps to StepUncertainty.parameter_uncertainty.

TYPE: str

PREDICTION

Uncertainty in the final prediction/output (e.g., the confidence score itself). Maps to StepUncertainty.prediction_uncertainty.

TYPE: str

ConfidenceMethodEnum

Bases: StrEnum

Methods used to produce confidence scores.

Each StepUncertainty records which method was used to generate its confidence estimate. Methods vary in cost, reliability, and calibration quality. See Teikari (2026), section 5, for the confidence scoring framework.

ATTRIBUTE DESCRIPTION
SELF_REPORT

Source-reported confidence (e.g., MusicBrainz data quality rating). Cheapest but least calibrated.

TYPE: str

MULTI_SAMPLE

Multiple-sample consistency (e.g., querying an LLM multiple times and measuring agreement).

TYPE: str

LOGPROB

Token log-probability from an LLM. Fast but requires logprob API access.

TYPE: str

ENSEMBLE

Ensemble of multiple models or methods. More expensive but better calibrated than single-model approaches.

TYPE: str

CONFORMAL

Conformal prediction providing coverage guarantees. Produces prediction sets rather than point estimates.

TYPE: str

SOURCE_WEIGHTED

Weighted average across data sources based on historical reliability (Yanez 2025 approach).

TYPE: str

HUMAN_RATED

Human expert rating. Highest authority but most expensive and slowest.

TYPE: str

HTC

Holistic Trajectory Calibration (Zhang 2026, arXiv:2601.15778). Uses trajectory-level features across the full pipeline for calibration.

TYPE: str

CalibrationStatusEnum

Bases: StrEnum

Calibration status of a confidence score.

Indicates whether a confidence score has been post-hoc calibrated (e.g., via Platt scaling or isotonic regression) to ensure that stated confidence matches empirical accuracy.

ATTRIBUTE DESCRIPTION
CALIBRATED

Score has been calibrated against a held-out calibration set. CalibrationMetadata.expected_calibration_error is meaningful.

TYPE: str

UNCALIBRATED

Score is raw/uncalibrated. May exhibit over- or under-confidence.

TYPE: str

PENDING

Calibration is pending (insufficient calibration data collected so far). Score should be treated as uncalibrated.

TYPE: str

ConfidenceTrendEnum

Bases: StrEnum

Confidence trend across pipeline steps (Zhang 2026).

Characterises the trajectory of confidence scores as a record passes through the pipeline. Used by TrajectoryCalibration for HTC-based calibration. See Zhang (2026, arXiv:2601.15778).

ATTRIBUTE DESCRIPTION
INCREASING

Confidence monotonically increases across steps. Typical when multiple corroborating sources are found.

TYPE: str

DECREASING

Confidence monotonically decreases. May indicate conflicting evidence discovered during resolution.

TYPE: str

STABLE

Confidence remains approximately constant. Typical for records with strong initial evidence (e.g., exact ID match).

TYPE: str

VOLATILE

Confidence oscillates across steps. May indicate unstable resolution or contradictory evidence. Often triggers needs_review = True.

TYPE: str

AttributionMethodEnum

Bases: StrEnum

Training data attribution (TDA) methods.

Future-readiness stubs for commercial landscape parity with Musical AI, Sureel, ProRata, and Sony's influence-function approach. These methods attempt to quantify how much a specific training example influenced a generative model's output.

ATTRIBUTE DESCRIPTION
TRAINING_TIME_INFLUENCE

Influence measured at training time (e.g., data Shapley values, TracIn). Requires access to training checkpoints.

TYPE: str

UNLEARNING_BASED

Influence measured via machine unlearning (retrain-without and compare). Expensive but theoretically sound.

TYPE: str

INFLUENCE_FUNCTIONS

Classical influence functions (Koh & Liang 2017). Approximates leave-one-out retraining via Hessian-vector products.

TYPE: str

EMBEDDING_SIMILARITY

Cosine similarity in embedding space between source and generated content. Cheapest but least rigorous.

TYPE: str

WATERMARK_DETECTION

Detection of embedded watermarks in generated content that trace back to training data (e.g., SynthID, AudioSeal).

TYPE: str

INFERENCE_TIME_CONDITIONING

Attribution via inference-time conditioning or prompting (e.g., Musical AI's approach of conditioning generation on a known work).

TYPE: str

RightsTypeEnum

Bases: StrEnum

Music rights types distinguishing compositional vs recording rights.

Future-readiness stub. In music licensing, rights are split between the composition (publishing) side and the recording (master) side. This distinction is critical for AI training attribution: a model may learn from the composition, the recording, or both.

Based on Sureel patent and LANDR rights management approaches.

ATTRIBUTE DESCRIPTION
MASTER_RECORDING

Rights in the specific audio recording (sound recording copyright). Typically held by the label or artist.

TYPE: str

COMPOSITION_PUBLISHING

Rights in the underlying composition (musical work copyright). Typically held by the publisher or songwriter.

TYPE: str

PERFORMANCE

Performance rights (public performance, broadcast). Managed by PROs (ASCAP, BMI, PRS, GEMA, etc.).

TYPE: str

MECHANICAL

Mechanical reproduction rights (physical copies, downloads, interactive streams).

TYPE: str

SYNC

Synchronisation rights (pairing music with visual media).

TYPE: str

MediaTypeEnum

Bases: StrEnum

Multi-modal attribution media types.

Future-readiness stub for multi-modal training data attribution. While this scaffold focuses on audio, the Sureel and ProRata approaches are modality-agnostic and support cross-modal attribution.

ATTRIBUTE DESCRIPTION
AUDIO

Audio content (waveform, spectrogram). Primary modality for this scaffold.

TYPE: str

IMAGE

Image content (album art, spectrograms as images).

TYPE: str

VIDEO

Video content (music videos, live performances).

TYPE: str

TEXT

Text content (lyrics, liner notes, reviews).

TYPE: str

SYMBOLIC_MUSIC

Symbolic music representations (MIDI, MusicXML, ABC notation).

TYPE: str

MULTIMODAL

Content spanning multiple modalities simultaneously.

TYPE: str

CertificationTypeEnum

Bases: StrEnum

External certification and compliance attestation types.

Future-readiness stub for third-party certifications that validate an AI system's training data practices. These certifications are attached to ComplianceAttestation records.

ATTRIBUTE DESCRIPTION
FAIRLY_TRAINED_LICENSED

Fairly Trained certification indicating all training data was licensed or in the public domain.

TYPE: str

C2PA_PROVENANCE

C2PA (Coalition for Content Provenance and Authenticity) provenance manifest attached to generated content.

TYPE: str

EU_AI_ACT_COMPLIANT

Self-declared or audited compliance with EU AI Act requirements for general-purpose AI (GPAI) models.

TYPE: str

CMO_APPROVED

Approved by a Collective Management Organisation (CMO) such as GEMA, PRS, or ASCAP for training data usage.

TYPE: str

WatermarkTypeEnum

Bases: StrEnum

Audio watermark types for provenance tracking.

Future-readiness stub for audio watermarking systems that embed imperceptible identifiers in audio signals. Watermarks enable post-hoc attribution of AI-generated content back to training data or generation source.

ATTRIBUTE DESCRIPTION
SYNTHID

Google DeepMind's SynthID audio watermarking. Embeds identifiers in spectrogram space.

TYPE: str

AUDIOSEAL

Meta's AudioSeal. Localised audio watermarking with detector that identifies watermarked segments.

TYPE: str

WAVMARK

WavMark academic watermarking approach. Embeds in the waveform domain.

TYPE: str

DIGIMARC

Digimarc commercial watermarking. Used in broadcast monitoring and content identification.

TYPE: str

RevenueModelEnum

Bases: StrEnum

Revenue sharing models for AI-generated music attribution.

Future-readiness stub for commercial revenue distribution models. Different platforms use different approaches to compensate rights holders whose works contributed to AI training.

ATTRIBUTE DESCRIPTION
FLAT_FEE_UPFRONT

One-time flat fee paid for training data licensing (e.g., LANDR model for stem packs).

TYPE: str

PRO_RATA_MONTHLY

Monthly pro-rata distribution based on catalog size or usage (e.g., streaming royalty model applied to AI training).

TYPE: str

PER_GENERATION

Payment per generation event that uses the rights holder's contribution (e.g., Kits AI voice model usage).

TYPE: str

INFLUENCE_BASED

Payment proportional to measured influence on generated output (e.g., Musical AI / Sureel approach using TDA methods).

TYPE: str

RegulatoryFrameworkEnum

Bases: StrEnum

Applicable regulatory and governance frameworks.

ISO 42001 defines internal AI governance roles; EU AI Act defines supply chain liability actors. They have zero terminological overlap and must be tracked separately. See Teikari (2026), section 8, for the regulatory mapping.

ATTRIBUTE DESCRIPTION
ISO_42001

ISO/IEC 42001 AI Management System standard. Defines internal governance roles (Top Management, AI System Owner, Internal Audit).

TYPE: str

EU_AI_ACT

EU Artificial Intelligence Act (Regulation 2024/1689). Defines risk categories and obligations for AI system providers/deployers.

TYPE: str

GPAI_CODE_OF_PRACTICE

General-Purpose AI Model Code of Practice (July 2025). Specifies transparency and copyright compliance requirements for GPAI models.

TYPE: str

DSM_DIRECTIVE

EU Digital Single Market Directive (2019/790). Art. 3-4 govern text-and-data mining exceptions and opt-out mechanisms.

TYPE: str

ESPR_DPP

EU Ecodesign for Sustainable Products Regulation / Digital Product Passport. Cross-domain provenance framework.

TYPE: str

GDPR

EU General Data Protection Regulation. Relevant when attribution records contain personal data (artist identities, reviewer info).

TYPE: str

ComplianceActorEnum

Bases: StrEnum

EU AI Act supply chain actors (Art. 3).

These are distinct from ISO 42001 internal governance roles (Top Management, AI System Owner, Internal Audit). An organization may simultaneously hold multiple actor classifications across different AI systems.

ATTRIBUTE DESCRIPTION
PROVIDER

Entity that develops or has an AI system developed and places it on the market or puts it into service (Art. 3(3)).

TYPE: str

DEPLOYER

Entity that uses an AI system under its authority (Art. 3(4)). The music platform using the attribution system.

TYPE: str

AUTHORISED_REPRESENTATIVE

Entity established in the EU mandated by a non-EU provider to act on their behalf (Art. 3(5)).

TYPE: str

IMPORTER

Entity established in the EU that places an AI system from a third country on the EU market (Art. 3(6)).

TYPE: str

DISTRIBUTOR

Entity in the supply chain that makes an AI system available on the EU market (Art. 3(7)).

TYPE: str

PRODUCT_MANUFACTURER

Manufacturer of a product that integrates an AI system as a safety component (Art. 3(8)).

TYPE: str

TdmReservationMethodEnum

Bases: StrEnum

Text-and-data-mining rights reservation methods.

Under EU DSM Directive Art. 4, copyright holders can opt out of TDM via machine-readable reservation. The GPAI Code of Practice (July 2025) requires providers to respect robots.txt and emerging protocols. Music has a structural gap: robots.txt is web-only and does not cover audio content accessed via APIs or streaming platforms.

See Teikari (2026), section 7, for the music-specific gap analysis.

ATTRIBUTE DESCRIPTION
ROBOTS_TXT

Standard robots.txt file on web servers. Web-only; does not cover audio files served via APIs or streaming platforms.

TYPE: str

LLMS_TXT

Emerging llms.txt protocol for specifying LLM training permissions at the domain level.

TYPE: str

MACHINE_READABLE_TAG

HTML meta tags or HTTP headers expressing TDM reservation (e.g., <meta name="tdm-reservation" content="1">).

TYPE: str

RIGHTS_RESERVATION_API

Programmatic API for querying rights reservation status. More flexible than static files but requires infrastructure.

TYPE: str

MCP_PERMISSION_QUERY

Model Context Protocol permission query. The approach advocated by this scaffold: machine-readable consent queries via MCP tools.

TYPE: str

Normalized Record (ETL Output)

normalized

NormalizedRecord boundary object schema (BO-1).

Output of the Data Engineering (ETL) pipeline. A single music entity normalized from one external source. Multiple NormalizedRecord instances for the same real-world entity (from different sources) feed into the Entity Resolution pipeline, which merges them into a ResolvedEntity.

This module defines the first boundary object in the five-pipeline architecture. All ETL extractors -- regardless of source format -- produce NormalizedRecord instances with a common schema, enabling uniform downstream processing.

See Also

music_attribution.schemas.resolved : The next boundary object in the pipeline. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5.

IdentifierBundle

Bases: BaseModel

Standard music industry identifiers bundle.

Collects all known standard identifiers for a music entity. At the A2/A3 assurance levels, at least one identifier must be present for machine-sourced records (MusicBrainz, Discogs, AcoustID). These identifiers are the primary key for exact-match entity resolution.

ATTRIBUTE DESCRIPTION
isrc

International Standard Recording Code. 12-character alphanumeric code uniquely identifying a specific recording (e.g., "GBAYE0601498"). Assigned by the IFPI.

TYPE: str or None

iswc

International Standard Musical Work Code. Identifies the underlying composition (e.g., "T-070.238.867-3"). Assigned by CISAC.

TYPE: str or None

isni

International Standard Name Identifier. 16-digit identifier for public identities of parties (e.g., "0000000121032683" for Imogen Heap). Assigned by the ISNI International Agency.

TYPE: str or None

ipi

Interested Party Information code. 9-11 digit code identifying rights holders in collecting society databases.

TYPE: str or None

mbid

MusicBrainz Identifier. UUID assigned by MusicBrainz to any entity in their database.

TYPE: str or None

discogs_id

Discogs numeric entity ID. Integer identifier in the Discogs database.

TYPE: int or None

acoustid

AcoustID identifier. UUID derived from audio fingerprint (Chromaprint) matching.

TYPE: str or None

Examples:

>>> bundle = IdentifierBundle(
...     isrc="GBAYE0601498",
...     mbid="a74b1b7f-71a5-4011-9441-d0b5e4122711",
... )
>>> bundle.has_any()
True
>>> IdentifierBundle().has_any()
False

has_any

has_any() -> bool

Check if at least one identifier is set.

RETURNS DESCRIPTION
bool

True if any identifier field is not None.

Source code in src/music_attribution/schemas/normalized.py
def has_any(self) -> bool:
    """Check if at least one identifier is set.

    Returns
    -------
    bool
        True if any identifier field is not None.
    """
    return any(
        v is not None
        for v in (self.isrc, self.iswc, self.isni, self.ipi, self.mbid, self.discogs_id, self.acoustid)
    )

SourceMetadata

Bases: BaseModel

Typed source-specific metadata attached to a NormalizedRecord.

Contains supplementary information that varies by source but follows a common schema. Fields that do not apply to a particular source are left as their defaults (None or empty list).

ATTRIBUTE DESCRIPTION
roles

Credit roles reported by the source (free-text, not yet mapped to CreditRoleEnum). Examples: ["performer", "producer"].

TYPE: list of str

release_date

Release date as reported by the source. String format varies (ISO 8601 preferred, but partial dates like "2005" are common in MusicBrainz).

TYPE: str or None

release_country

ISO 3166-1 alpha-2 country code for the release territory (e.g., "GB", "US").

TYPE: str or None

genres

Genre tags reported by the source. Free-text, not standardised across sources.

TYPE: list of str

duration_ms

Track duration in milliseconds. May differ between sources due to different mastering or silence handling.

TYPE: int or None

track_number

Track position within the release medium.

TYPE: int or None

medium_format

Physical or digital medium format (e.g., "CD", "Vinyl", "Digital Media").

TYPE: str or None

language

ISO 639-1 language code for lyrics/vocals (e.g., "en").

TYPE: str or None

extras

Catch-all for source-specific fields that do not map to the common schema. Keys and values are both strings.

TYPE: dict of str to str

Examples:

>>> meta = SourceMetadata(
...     roles=["performer", "songwriter"],
...     release_date="2005-10-17",
...     release_country="GB",
...     genres=["electronic", "art pop"],
...     duration_ms=265000,
... )

Relationship

Bases: BaseModel

Link between entities within a single data source.

Represents a directed edge from the parent NormalizedRecord to another entity identified by its source-specific ID. These relationships are source-local; cross-source relationship resolution happens in the Entity Resolution pipeline, producing ResolvedRelationship objects.

ATTRIBUTE DESCRIPTION
relationship_type

The type of relationship (e.g., PERFORMED, WROTE, PRODUCED).

TYPE: RelationshipTypeEnum

target_source

The data source of the target entity.

TYPE: SourceEnum

target_source_id

Source-specific identifier of the target entity (e.g., a MusicBrainz MBID or Discogs numeric ID as string).

TYPE: str

target_entity_type

The entity type of the target (e.g., ARTIST, RECORDING).

TYPE: EntityTypeEnum

attributes

Additional relationship attributes (e.g., {"instrument": "piano"}, {"begin_date": "2005-01"}).

TYPE: dict of str to str

Examples:

>>> rel = Relationship(
...     relationship_type=RelationshipTypeEnum.PERFORMED,
...     target_source=SourceEnum.MUSICBRAINZ,
...     target_source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
...     target_entity_type=EntityTypeEnum.ARTIST,
...     attributes={"instrument": "vocals"},
... )

NormalizedRecord

Bases: BaseModel

ETL output: normalized music metadata from a single external source.

The NormalizedRecord is the first boundary object (BO-1) in the five-pipeline architecture. All ETL extractors produce NormalizedRecord instances regardless of their source format. Multiple records for the same real-world entity (from different sources) are merged by the Entity Resolution pipeline into a ResolvedEntity.

ATTRIBUTE DESCRIPTION
schema_version

Semantic version of the NormalizedRecord schema. Defaults to "1.0.0". Used for forward/backward compatibility checks.

TYPE: str

record_id

Unique identifier for this record. Auto-generated UUIDv4.

TYPE: UUID

source

Which data source provided this record.

TYPE: SourceEnum

source_id

Source-specific identifier (e.g., MusicBrainz MBID, Discogs release ID as string).

TYPE: str

entity_type

The type of music entity this record represents.

TYPE: EntityTypeEnum

canonical_name

Primary name of the entity as reported by the source. Must be non-empty after whitespace stripping.

TYPE: str

alternative_names

Alternative names, aliases, or transliterations. Used during fuzzy entity resolution.

TYPE: list of str

identifiers

Standard music industry identifiers (ISRC, ISWC, ISNI, etc.). Machine sources (MusicBrainz, Discogs, AcoustID) must provide at least one identifier.

TYPE: IdentifierBundle

metadata

Source-specific metadata (genres, release date, duration, etc.).

TYPE: SourceMetadata

relationships

Links to other entities within the same source.

TYPE: list of Relationship

fetch_timestamp

UTC timestamp when this record was fetched from the source. Must be timezone-aware and not more than 60 seconds in the future (to catch clock skew).

TYPE: datetime

source_confidence

Source-reported confidence in the data, range [0.0, 1.0]. 0.0 = no confidence data available; 1.0 = verified by authority.

TYPE: float

raw_payload

Original API response preserved for debugging and re-processing. May be None if raw data is not retained.

TYPE: dict or None

Examples:

>>> from datetime import datetime, UTC
>>> record = NormalizedRecord(
...     source=SourceEnum.MUSICBRAINZ,
...     source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
...     entity_type=EntityTypeEnum.RECORDING,
...     canonical_name="Hide and Seek",
...     identifiers=IdentifierBundle(isrc="GBAYE0601498"),
...     fetch_timestamp=datetime.now(UTC),
...     source_confidence=0.87,
... )
See Also

ResolvedEntity : The next boundary object produced by Entity Resolution.

validate_canonical_name classmethod

validate_canonical_name(v: str) -> str

Canonical name must be non-empty after stripping.

Source code in src/music_attribution/schemas/normalized.py
@field_validator("canonical_name")
@classmethod
def validate_canonical_name(cls, v: str) -> str:
    """Canonical name must be non-empty after stripping."""
    if not v.strip():
        msg = "canonical_name must be non-empty after stripping whitespace"
        raise ValueError(msg)
    return v.strip()

validate_fetch_timestamp classmethod

validate_fetch_timestamp(v: datetime) -> datetime

Fetch timestamp must be timezone-aware and not far in the future.

Source code in src/music_attribution/schemas/normalized.py
@field_validator("fetch_timestamp")
@classmethod
def validate_fetch_timestamp(cls, v: datetime) -> datetime:
    """Fetch timestamp must be timezone-aware and not far in the future."""
    if v.tzinfo is None:
        msg = "fetch_timestamp must be timezone-aware (UTC)"
        raise ValueError(msg)
    max_future = datetime.now(UTC) + timedelta(seconds=60)
    if v > max_future:
        msg = "fetch_timestamp must not be more than 60 seconds in the future"
        raise ValueError(msg)
    return v

validate_identifiers_for_machine_sources

validate_identifiers_for_machine_sources() -> (
    NormalizedRecord
)

Machine sources require at least one identifier.

Source code in src/music_attribution/schemas/normalized.py
@model_validator(mode="after")
def validate_identifiers_for_machine_sources(self) -> NormalizedRecord:
    """Machine sources require at least one identifier."""
    machine_sources = {SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS, SourceEnum.ACOUSTID}
    if self.source in machine_sources and not self.identifiers.has_any():
        msg = f"Source {self.source} requires at least one identifier in IdentifierBundle"
        raise ValueError(msg)
    return self

Resolved Entity (Resolution Output)

resolved

ResolvedEntity boundary object schema (BO-2).

Output of the Entity Resolution pipeline. A unified entity that merges multiple NormalizedRecord instances from different sources into a single canonical entity with resolution confidence and assurance level.

The ResolvedEntity is the second boundary object in the five-pipeline architecture. It carries forward the provenance of every source that contributed to it, enabling downstream attribution scoring to weight sources by reliability.

See Also

music_attribution.schemas.normalized : The preceding boundary object. music_attribution.schemas.attribution : The next boundary object. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5.

SourceReference

Bases: BaseModel

Reference to a contributing NormalizedRecord.

Links a ResolvedEntity back to the specific NormalizedRecord that contributed to it, preserving full provenance. The agreement score measures how well this source's data aligns with the resolved consensus.

ATTRIBUTE DESCRIPTION
record_id

UUID of the contributing NormalizedRecord.

TYPE: UUID

source

Which data source provided the record.

TYPE: SourceEnum

source_id

Source-specific identifier of the record.

TYPE: str

agreement_score

How well this source agrees with the resolved consensus, range [0.0, 1.0]. 1.0 = perfect agreement on all fields; 0.0 = complete disagreement.

TYPE: float

Examples:

>>> ref = SourceReference(
...     record_id=uuid.uuid4(),
...     source=SourceEnum.MUSICBRAINZ,
...     source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
...     agreement_score=0.95,
... )

ResolutionDetails

Bases: BaseModel

Per-method confidence breakdown for entity resolution.

Records the confidence contribution from each resolution method that was attempted. Only populated fields were actually used; None means that method was not applied. This enables post-hoc analysis of which methods are most effective for different entity types.

ATTRIBUTE DESCRIPTION
string_similarity

Confidence from fuzzy string matching (Jaro-Winkler, Levenshtein), range [0.0, 1.0]. None if not attempted.

TYPE: float or None

embedding_similarity

Confidence from semantic embedding similarity (cosine distance), range [0.0, 1.0]. None if not attempted.

TYPE: float or None

graph_path_confidence

Confidence from graph-based resolution (path length, co-occurrence patterns), range [0.0, 1.0]. None if not attempted.

TYPE: float or None

llm_confidence

Confidence from LLM-assisted resolution, range [0.0, 1.0]. None if not attempted.

TYPE: float or None

matched_identifiers

Names of identifiers that matched exactly (e.g., ["isrc", "mbid"]). Empty if no exact matches.

TYPE: list of str

Examples:

>>> details = ResolutionDetails(
...     string_similarity=0.92,
...     matched_identifiers=["isrc"],
... )

ResolvedRelationship

Bases: BaseModel

Resolved cross-entity relationship link.

Unlike source-local Relationship objects in NormalizedRecord, a ResolvedRelationship links two ResolvedEntity instances and is backed by one or more corroborating data sources.

ATTRIBUTE DESCRIPTION
target_entity_id

UUID of the target ResolvedEntity.

TYPE: UUID

relationship_type

The type of relationship (e.g., PERFORMED, WROTE).

TYPE: RelationshipTypeEnum

confidence

Confidence in this relationship, range [0.0, 1.0]. Higher when multiple sources corroborate the link.

TYPE: float

supporting_sources

Data sources that corroborate this relationship. More sources generally means higher confidence.

TYPE: list of SourceEnum

Examples:

>>> rel = ResolvedRelationship(
...     target_entity_id=uuid.uuid4(),
...     relationship_type=RelationshipTypeEnum.PERFORMED,
...     confidence=0.92,
...     supporting_sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
... )

Conflict

Bases: BaseModel

Unresolved disagreement between data sources.

When entity resolution encounters contradictory information from different sources for the same field, it records a Conflict rather than silently choosing one value. Conflicts with severity HIGH or CRITICAL trigger needs_review = True on the parent ResolvedEntity.

ATTRIBUTE DESCRIPTION
field

Name of the field in conflict (e.g., "canonical_name", "release_date").

TYPE: str

values

Mapping of source name to its reported value. Keys are source identifiers (e.g., "MUSICBRAINZ"), values are the conflicting field values.

TYPE: dict of str to str

severity

How severe the disagreement is, from LOW (auto-resolvable) to CRITICAL (blocks attribution).

TYPE: ConflictSeverityEnum

Examples:

>>> conflict = Conflict(
...     field="canonical_name",
...     values={"MUSICBRAINZ": "Imogen Heap", "DISCOGS": "I. Heap"},
...     severity=ConflictSeverityEnum.LOW,
... )

ResolvedEntity

Bases: BaseModel

Unified entity resolved from multiple data sources.

The ResolvedEntity is the second boundary object (BO-2) in the five-pipeline architecture. It is produced by the Entity Resolution pipeline and consumed by the Attribution Engine. Each instance represents a single real-world music entity (artist, recording, work, etc.) with a canonical identity established by merging one or more NormalizedRecord instances.

ATTRIBUTE DESCRIPTION
schema_version

Semantic version of the ResolvedEntity schema. Defaults to "1.0.0".

TYPE: str

entity_id

Unique identifier for this resolved entity. Auto-generated UUIDv4.

TYPE: UUID

entity_type

The type of music entity (RECORDING, WORK, ARTIST, etc.).

TYPE: EntityTypeEnum

canonical_name

Best-consensus name for the entity, chosen from contributing sources by the resolution algorithm.

TYPE: str

alternative_names

All other names/aliases from contributing sources, used for future matching and display.

TYPE: list of str

identifiers

Merged identifier bundle combining identifiers from all contributing sources.

TYPE: IdentifierBundle

source_records

References to all NormalizedRecord instances that were merged into this entity. Must contain at least one.

TYPE: list of SourceReference

resolution_method

Primary method used to resolve/merge the source records.

TYPE: ResolutionMethodEnum

resolution_confidence

Overall confidence in the resolution, range [0.0, 1.0]. This is the resolution pipeline's assessment of how likely it is that all merged records truly refer to the same entity.

TYPE: float

resolution_details

Per-method confidence breakdown showing which methods contributed and their individual confidence scores.

TYPE: ResolutionDetails

assurance_level

A0-A3 assurance level determined by the number and quality of corroborating sources. See Teikari (2026), section 6.

TYPE: AssuranceLevelEnum

relationships

Cross-entity links resolved from source-local relationships.

TYPE: list of ResolvedRelationship

conflicts

Unresolved disagreements between sources. May trigger human review if severity is HIGH or CRITICAL.

TYPE: list of Conflict

needs_review

Flag indicating this entity requires human review before attribution scoring proceeds.

TYPE: bool

review_reason

Human-readable explanation of why review is needed. Required when needs_review is True.

TYPE: str or None

merged_from

If this entity was formed by merging previously separate ResolvedEntity instances, their IDs are listed here.

TYPE: list of uuid.UUID or None

resolved_at

UTC timestamp when resolution was performed. Must be timezone-aware.

TYPE: datetime

Examples:

>>> from datetime import datetime, UTC
>>> entity = ResolvedEntity(
...     entity_type=EntityTypeEnum.RECORDING,
...     canonical_name="Hide and Seek",
...     source_records=[
...         SourceReference(
...             record_id=uuid.uuid4(),
...             source=SourceEnum.MUSICBRAINZ,
...             source_id="abc-123",
...             agreement_score=0.95,
...         ),
...     ],
...     resolution_method=ResolutionMethodEnum.EXACT_ID,
...     resolution_confidence=0.98,
...     assurance_level=AssuranceLevelEnum.LEVEL_2,
...     resolved_at=datetime.now(UTC),
... )
See Also

NormalizedRecord : The preceding boundary object from ETL. AttributionRecord : The next boundary object from Attribution Engine.

validate_resolved_at classmethod

validate_resolved_at(v: datetime) -> datetime

Resolved timestamp must be timezone-aware.

Source code in src/music_attribution/schemas/resolved.py
@field_validator("resolved_at")
@classmethod
def validate_resolved_at(cls, v: datetime) -> datetime:
    """Resolved timestamp must be timezone-aware."""
    if v.tzinfo is None:
        msg = "resolved_at must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_review_fields

validate_review_fields() -> ResolvedEntity

If needs_review is True, review_reason must be provided.

Source code in src/music_attribution/schemas/resolved.py
@model_validator(mode="after")
def validate_review_fields(self) -> ResolvedEntity:
    """If needs_review is True, review_reason must be provided."""
    if self.needs_review and self.review_reason is None:
        msg = "review_reason must be non-None when needs_review is True"
        raise ValueError(msg)
    return self

Attribution Record (Engine Output)

attribution

AttributionRecord boundary object schema (BO-3).

Output of the Attribution Engine pipeline. A complete attribution record for a musical work/recording with calibrated confidence scores, conformal prediction sets, and a full provenance chain.

The AttributionRecord is the third boundary object in the five-pipeline architecture and the primary output consumed by the API/MCP Server and Chat Interface pipelines. It carries the complete audit trail of how an attribution was constructed, enabling transparent confidence communication to end users.

See Also

music_attribution.schemas.resolved : The preceding boundary object. music_attribution.schemas.feedback : Reverse-flow feedback from users. music_attribution.schemas.uncertainty : Uncertainty decomposition models. Teikari, P. (2026). Music Attribution with Transparent Confidence, sections 5-6.

EventDetails module-attribute

EventDetails = Annotated[
    FetchEventDetails
    | ResolveEventDetails
    | ScoreEventDetails
    | ReviewEventDetails
    | UpdateEventDetails
    | FeedbackEventDetails,
    Field(discriminator="type"),
]

Discriminated union of provenance event detail types.

Uses Pydantic's discriminator field (type) to deserialize into the correct detail class. Each variant corresponds to a ProvenanceEventTypeEnum value.

Credit

Bases: BaseModel

Attribution credit for a single entity-role pair.

Represents one line item in the attribution: a specific entity (artist, producer, etc.) credited in a specific role on a musical work or recording. Each credit carries its own confidence score and assurance level, independent of the overall record.

ATTRIBUTE DESCRIPTION
entity_id

UUID of the ResolvedEntity receiving this credit.

TYPE: UUID

entity_name

Display name of the credited entity. Defaults to empty string; populated for API/UI convenience.

TYPE: str

role

The role in which the entity is credited (e.g., PERFORMER, SONGWRITER, PRODUCER).

TYPE: CreditRoleEnum

role_detail

Additional role detail not captured by the enum (e.g., "lead vocals", "bass guitar").

TYPE: str or None

confidence

Confidence in this specific credit assignment, range [0.0, 1.0]. May differ from the overall record confidence.

TYPE: float

sources

Data sources that corroborate this credit. More sources generally yield higher confidence.

TYPE: list of SourceEnum

assurance_level

A0-A3 assurance level for this specific credit.

TYPE: AssuranceLevelEnum

Examples:

>>> credit = Credit(
...     entity_id=uuid.uuid4(),
...     entity_name="Imogen Heap",
...     role=CreditRoleEnum.PERFORMER,
...     role_detail="lead vocals, keyboards",
...     confidence=0.95,
...     sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
...     assurance_level=AssuranceLevelEnum.LEVEL_2,
... )

ConformalSet

Bases: BaseModel

Conformal prediction set at a specified coverage level.

Instead of a single point prediction for each credit role, conformal prediction produces a set of plausible roles that contains the true role with a guaranteed probability (the coverage level). Smaller sets indicate higher confidence. See Teikari (2026), section 5.

ATTRIBUTE DESCRIPTION
coverage_level

Target coverage probability, range (0.0, 1.0) exclusive. Typical values: 0.90 (90% coverage) or 0.95 (95% coverage).

TYPE: float

prediction_sets

Mapping of entity ID (as string) to the set of plausible roles for that entity. Smaller sets = more certain attribution.

TYPE: dict of str to list of CreditRoleEnum

set_sizes

Mapping of entity ID to the cardinality of their prediction set. A set size of 1 means the role is unambiguous at the given coverage level.

TYPE: dict of str to int

marginal_coverage

Observed marginal coverage on the calibration set, range [0.0, 1.0]. Should be close to coverage_level if well calibrated.

TYPE: float

calibration_error

Absolute difference between coverage_level and marginal_coverage. Lower is better. Non-negative.

TYPE: float

calibration_method

Name of the calibration method used (e.g., "split_conformal", "jackknife_plus").

TYPE: str

calibration_set_size

Number of examples in the calibration set. Larger sets give tighter coverage guarantees. Non-negative.

TYPE: int

Examples:

>>> conformal = ConformalSet(
...     coverage_level=0.90,
...     prediction_sets={"entity-uuid": [CreditRoleEnum.PERFORMER]},
...     set_sizes={"entity-uuid": 1},
...     marginal_coverage=0.91,
...     calibration_error=0.01,
...     calibration_method="split_conformal",
...     calibration_set_size=500,
... )

FetchEventDetails

Bases: BaseModel

Details for FETCH provenance events.

Records metadata about a data fetch operation from an external source as part of the provenance chain.

ATTRIBUTE DESCRIPTION
type

Discriminator field for the EventDetails union. Always "fetch".

TYPE: Literal['fetch']

source

The data source that was queried.

TYPE: SourceEnum

source_id

Source-specific query identifier or endpoint.

TYPE: str

records_fetched

Number of records returned by the fetch. Non-negative.

TYPE: int

rate_limited

Whether the fetch was rate-limited by the source API.

TYPE: bool

ResolveEventDetails

Bases: BaseModel

Details for RESOLVE provenance events.

Records metadata about an entity resolution step, including the method used and the reduction ratio (input records to output entities).

ATTRIBUTE DESCRIPTION
type

Discriminator field. Always "resolve".

TYPE: Literal['resolve']

method

Name of the resolution method or algorithm used.

TYPE: str

records_input

Number of NormalizedRecord instances fed into resolution. Non-negative.

TYPE: int

entities_output

Number of ResolvedEntity instances produced. Non-negative. Should be <= records_input.

TYPE: int

confidence_range

(min, max) confidence range across all output entities. Defaults to (0.0, 1.0).

TYPE: tuple of (float, float)

ScoreEventDetails

Bases: BaseModel

Details for SCORE provenance events.

Records a confidence scoring or recalibration step, showing how the confidence value changed.

ATTRIBUTE DESCRIPTION
type

Discriminator field. Always "score".

TYPE: Literal['score']

previous_confidence

Confidence before this scoring step, range [0.0, 1.0]. None for the initial scoring event.

TYPE: float or None

new_confidence

Confidence after this scoring step, range [0.0, 1.0].

TYPE: float

scoring_method

Name of the scoring/calibration method applied (e.g., "source_weighted_average", "platt_scaling").

TYPE: str

ReviewEventDetails

Bases: BaseModel

Details for REVIEW provenance events.

Records that a human reviewer examined the attribution and optionally applied corrections from a FeedbackCard.

ATTRIBUTE DESCRIPTION
type

Discriminator field. Always "review".

TYPE: Literal['review']

reviewer_id

Identifier of the reviewer who performed the review.

TYPE: str

feedback_card_id

UUID of the FeedbackCard that was applied.

TYPE: UUID

corrections_applied

Number of field corrections accepted from the feedback card. Non-negative. Zero means the reviewer confirmed the record without changes.

TYPE: int

UpdateEventDetails

Bases: BaseModel

Details for UPDATE provenance events.

Records a version bump on the attribution record, including which fields changed and what triggered the update.

ATTRIBUTE DESCRIPTION
type

Discriminator field. Always "update".

TYPE: Literal['update']

previous_version

Version number before this update. Minimum 1.

TYPE: int

new_version

Version number after this update. Minimum 1. Should be previous_version + 1.

TYPE: int

fields_changed

Names of fields that were modified in this update.

TYPE: list of str

trigger

What triggered the update (e.g., "feedback_accepted", "source_refresh", "conflict_resolved").

TYPE: str

FeedbackEventDetails

Bases: BaseModel

Details for FEEDBACK provenance events.

Records that a FeedbackCard was processed by the Attribution Engine and its corrections were either accepted or rejected.

ATTRIBUTE DESCRIPTION
type

Discriminator field. Always "feedback".

TYPE: Literal['feedback']

feedback_card_id

UUID of the FeedbackCard that was processed.

TYPE: UUID

overall_assessment

The reviewer's overall assessment score from the feedback card, range [0.0, 1.0].

TYPE: float

corrections_count

Number of corrections in the feedback card. Non-negative.

TYPE: int

accepted

Whether the feedback was accepted and applied to the attribution record.

TYPE: bool

ProvenanceEvent

Bases: BaseModel

Single event in the attribution provenance audit trail.

Each ProvenanceEvent records one discrete action that contributed to or modified an attribution record. The chain of events forms an immutable audit trail enabling full transparency of how an attribution was constructed and refined.

ATTRIBUTE DESCRIPTION
event_type

High-level event type (FETCH, RESOLVE, SCORE, REVIEW, UPDATE, FEEDBACK).

TYPE: ProvenanceEventTypeEnum

timestamp

UTC timestamp when this event occurred. Must be timezone-aware.

TYPE: datetime

agent

Identifier of the software agent or human that performed this action (e.g., "etl-musicbrainz-v1.2", "reviewer-jdoe").

TYPE: str

details

Typed event details (discriminated union). The concrete type matches event_type.

TYPE: EventDetails

feedback_card_id

UUID of the associated FeedbackCard, if this event was triggered by user feedback.

TYPE: UUID or None

step_uncertainty

Uncertainty decomposition for this specific pipeline step, if available.

TYPE: StepUncertainty or None

citation_index

1-based citation index for referencing this event in chat responses. None if not cited. Minimum 1 when set.

TYPE: int or None

validate_timestamp classmethod

validate_timestamp(v: datetime) -> datetime

Timestamp must be timezone-aware.

Source code in src/music_attribution/schemas/attribution.py
@field_validator("timestamp")
@classmethod
def validate_timestamp(cls, v: datetime) -> datetime:
    """Timestamp must be timezone-aware."""
    if v.tzinfo is None:
        msg = "timestamp must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

AttributionRecord

Bases: BaseModel

Complete attribution record for a musical work or recording.

The AttributionRecord is the third boundary object (BO-3) in the five-pipeline architecture and the primary deliverable of the Attribution Engine. It is consumed by the API/MCP Server (for permission queries) and the Chat Interface (for user-facing attribution display).

Each record contains: (1) a list of credits with per-credit confidence, (2) conformal prediction sets providing coverage guarantees, (3) an immutable provenance chain, and (4) an optional uncertainty summary. See Teikari (2026), sections 5-6.

ATTRIBUTE DESCRIPTION
schema_version

Semantic version of the AttributionRecord schema. Defaults to "1.0.0".

TYPE: str

attribution_id

Unique identifier for this attribution record. Auto-generated UUIDv4.

TYPE: UUID

work_entity_id

UUID of the ResolvedEntity (work or recording) that this attribution describes.

TYPE: UUID

work_title

Display title of the work. Populated for API/UI convenience.

TYPE: str

artist_name

Display name of the primary artist. Populated for API/UI convenience.

TYPE: str

credits

Attribution credits. Must contain at least one credit. Each credit links an entity to a role with confidence scoring.

TYPE: list of Credit

assurance_level

Overall A0-A3 assurance level for this attribution record, determined by the weakest link in the evidence chain.

TYPE: AssuranceLevelEnum

confidence_score

Overall calibrated confidence score, range [0.0, 1.0]. Aggregated from per-credit confidences and source agreement.

TYPE: float

conformal_set

Conformal prediction set providing coverage guarantees on role assignments.

TYPE: ConformalSet

source_agreement

Degree of agreement across data sources, range [0.0, 1.0]. 1.0 = all sources agree on all credits; 0.0 = total disagreement.

TYPE: float

provenance_chain

Ordered list of provenance events forming the audit trail. Events are appended chronologically.

TYPE: list of ProvenanceEvent

uncertainty_summary

Aggregated uncertainty decomposition across all pipeline steps. None if uncertainty tracking is not enabled.

TYPE: UncertaintyAwareProvenance or None

needs_review

Flag indicating this record requires human review before being surfaced to end users.

TYPE: bool

review_priority

Priority score for the review queue, range [0.0, 1.0]. Higher values = more urgent review needed.

TYPE: float

created_at

UTC timestamp when this record was first created. Must be timezone-aware.

TYPE: datetime

updated_at

UTC timestamp of the most recent update. Must be timezone-aware. Must be >= created_at.

TYPE: datetime

version

Monotonically increasing version number. Minimum 1. Bumped on every update.

TYPE: int

Examples:

>>> from datetime import datetime, UTC
>>> record = AttributionRecord(
...     work_entity_id=uuid.uuid4(),
...     work_title="Hide and Seek",
...     artist_name="Imogen Heap",
...     credits=[
...         Credit(
...             entity_id=uuid.uuid4(),
...             entity_name="Imogen Heap",
...             role=CreditRoleEnum.PERFORMER,
...             confidence=0.95,
...             sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
...             assurance_level=AssuranceLevelEnum.LEVEL_2,
...         ),
...     ],
...     assurance_level=AssuranceLevelEnum.LEVEL_2,
...     confidence_score=0.92,
...     conformal_set=ConformalSet(
...         coverage_level=0.90,
...         marginal_coverage=0.91,
...         calibration_error=0.01,
...         calibration_method="split_conformal",
...         calibration_set_size=500,
...     ),
...     source_agreement=0.88,
...     review_priority=0.1,
...     created_at=datetime.now(UTC),
...     updated_at=datetime.now(UTC),
...     version=1,
... )
See Also

ResolvedEntity : The preceding boundary object from Entity Resolution. FeedbackCard : Reverse-flow feedback for calibration updates.

validate_timestamps classmethod

validate_timestamps(v: datetime) -> datetime

Timestamps must be timezone-aware.

Source code in src/music_attribution/schemas/attribution.py
@field_validator("created_at", "updated_at")
@classmethod
def validate_timestamps(cls, v: datetime) -> datetime:
    """Timestamps must be timezone-aware."""
    if v.tzinfo is None:
        msg = "Timestamps must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_updated_after_created

validate_updated_after_created() -> AttributionRecord

updated_at must be >= created_at.

Source code in src/music_attribution/schemas/attribution.py
@model_validator(mode="after")
def validate_updated_after_created(self) -> AttributionRecord:
    """updated_at must be >= created_at."""
    if self.updated_at < self.created_at:
        msg = "updated_at must be >= created_at"
        raise ValueError(msg)
    return self

Permissions

permissions

PermissionBundle boundary object schema (BO-5).

Machine-readable permission specification for MCP consent queries. Implements the Permission Patchbay from Teikari (2026), section 7.

The PermissionBundle enables AI platforms and other consumers to programmatically query whether specific uses of a musical work are permitted, under what conditions, and who authorised the permission. This is the MCP-native alternative to robots.txt for audio content.

See Also

music_attribution.schemas.enums : Permission-related enums. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 7 (Permission Patchbay).

PermissionCondition

Bases: BaseModel

Optional condition attached to a permission entry.

Conditions qualify a permission with additional constraints. For example, a permission might only apply in certain territories or time periods.

ATTRIBUTE DESCRIPTION
condition_type

Type of condition (e.g., "territory", "date_range", "max_duration_seconds", "non_commercial_only").

TYPE: str

value

Condition value as a string. Format depends on condition_type (e.g., "US,GB,DE" for territory, "2025-01-01/2026-12-31" for date range).

TYPE: str

Examples:

>>> cond = PermissionCondition(
...     condition_type="territory",
...     value="US,GB,DE",
... )

PermissionEntry

Bases: BaseModel

A single permission with optional conditions and requirements.

Each entry specifies a permission type (what use), a value (allow, deny, ask, or conditional), and optional requirements. Conditional values (ALLOW_WITH_ATTRIBUTION, ALLOW_WITH_ROYALTY) require their respective fields to be populated.

ATTRIBUTE DESCRIPTION
permission_type

What kind of use this permission governs (e.g., AI_TRAINING, STREAM, REMIX).

TYPE: PermissionTypeEnum

value

The permission decision (ALLOW, DENY, ASK, ALLOW_WITH_ATTRIBUTION, ALLOW_WITH_ROYALTY).

TYPE: PermissionValueEnum

conditions

Additional conditions qualifying this permission.

TYPE: list of PermissionCondition

royalty_rate

Royalty rate as a decimal (e.g., Decimal("0.015") for 1.5%). Required when value is ALLOW_WITH_ROYALTY; must be > 0.

TYPE: Decimal or None

attribution_requirement

Required attribution text or format. Required when value is ALLOW_WITH_ATTRIBUTION.

TYPE: str or None

territory

ISO 3166-1 alpha-2 country codes where this permission applies. None means worldwide.

TYPE: list of str or None

Examples:

>>> entry = PermissionEntry(
...     permission_type=PermissionTypeEnum.AI_TRAINING,
...     value=PermissionValueEnum.ALLOW_WITH_ROYALTY,
...     royalty_rate=Decimal("0.015"),
...     territory=["US", "GB"],
... )

validate_conditional_fields

validate_conditional_fields() -> PermissionEntry

Validate fields required by specific permission values.

Source code in src/music_attribution/schemas/permissions.py
@model_validator(mode="after")
def validate_conditional_fields(self) -> PermissionEntry:
    """Validate fields required by specific permission values."""
    if self.value == PermissionValueEnum.ALLOW_WITH_ROYALTY and (
        self.royalty_rate is None or self.royalty_rate <= 0
    ):
        msg = "royalty_rate must be > 0 when value is ALLOW_WITH_ROYALTY"
        raise ValueError(msg)
    if self.value == PermissionValueEnum.ALLOW_WITH_ATTRIBUTION and self.attribution_requirement is None:
        msg = "attribution_requirement must be non-None when value is ALLOW_WITH_ATTRIBUTION"
        raise ValueError(msg)
    return self

DelegationEntry

Bases: BaseModel

An entry in the permission delegation chain.

Models the chain of authority from the rights owner through intermediaries (manager, label, distributor). Each entry specifies what authority the entity has over the permissions.

ATTRIBUTE DESCRIPTION
entity_id

UUID of the entity in the delegation chain (a ResolvedEntity of type ARTIST, LABEL, etc.).

TYPE: UUID

role

The entity's role in the delegation chain (OWNER, MANAGER, LABEL, DISTRIBUTOR).

TYPE: DelegationRoleEnum

can_modify

Whether this entity can modify permission entries. Typically True for OWNER and MANAGER, False for DISTRIBUTOR.

TYPE: bool

can_delegate

Whether this entity can further delegate authority to another entity.

TYPE: bool

Examples:

>>> entry = DelegationEntry(
...     entity_id=uuid.uuid4(),
...     role=DelegationRoleEnum.OWNER,
...     can_modify=True,
...     can_delegate=True,
... )

PermissionBundle

Bases: BaseModel

Machine-readable permission specification (BO-5).

The PermissionBundle is the boundary object used by the API/MCP Server pipeline to answer permission queries from AI platforms and other consumers. It implements the Permission Patchbay design from Teikari (2026), section 7.

Each bundle specifies permissions at a given scope (catalog, release, recording, or work) with an effective date range, a delegation chain showing who authorised the permissions, and a default permission for unlisted permission types.

ATTRIBUTE DESCRIPTION
schema_version

Semantic version of the PermissionBundle schema. Defaults to "1.0.0".

TYPE: str

permission_id

Unique identifier for this permission bundle. Auto-generated UUIDv4.

TYPE: UUID

entity_id

UUID of the rights holder entity (artist, label, publisher).

TYPE: UUID

scope

Granularity of this permission bundle (CATALOG, RELEASE, RECORDING, WORK).

TYPE: PermissionScopeEnum

scope_entity_id

UUID of the specific entity this permission applies to. Must be None when scope is CATALOG; must be non-None otherwise.

TYPE: UUID or None

permissions

List of individual permission entries. Must contain at least one.

TYPE: list of PermissionEntry

effective_from

UTC timestamp from which this bundle is effective. Must be timezone-aware.

TYPE: datetime

effective_until

UTC timestamp until which this bundle is effective. None means no expiry. Must be after effective_from when set.

TYPE: datetime or None

delegation_chain

Chain of authority from the rights owner through intermediaries.

TYPE: list of DelegationEntry

default_permission

Default response for permission types not explicitly listed in permissions. Typically ASK or DENY.

TYPE: PermissionValueEnum

created_by

UUID of the entity that created this permission bundle.

TYPE: UUID

updated_at

UTC timestamp of the most recent update. Must be timezone-aware.

TYPE: datetime

version

Monotonically increasing version number. Minimum 1.

TYPE: int

Examples:

>>> from datetime import datetime, UTC
>>> from decimal import Decimal
>>> bundle = PermissionBundle(
...     entity_id=uuid.uuid4(),
...     scope=PermissionScopeEnum.CATALOG,
...     permissions=[
...         PermissionEntry(
...             permission_type=PermissionTypeEnum.AI_TRAINING,
...             value=PermissionValueEnum.DENY,
...         ),
...     ],
...     effective_from=datetime.now(UTC),
...     default_permission=PermissionValueEnum.ASK,
...     created_by=uuid.uuid4(),
...     updated_at=datetime.now(UTC),
...     version=1,
... )
See Also

PermissionEntry : Individual permission with conditions. DelegationEntry : Authority chain for permission provenance.

validate_timestamps classmethod

validate_timestamps(v: datetime) -> datetime

Timestamps must be timezone-aware.

Source code in src/music_attribution/schemas/permissions.py
@field_validator("effective_from", "updated_at")
@classmethod
def validate_timestamps(cls, v: datetime) -> datetime:
    """Timestamps must be timezone-aware."""
    if v.tzinfo is None:
        msg = "Timestamps must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_effective_until classmethod

validate_effective_until(
    v: datetime | None,
) -> datetime | None

effective_until must be timezone-aware when set.

Source code in src/music_attribution/schemas/permissions.py
@field_validator("effective_until")
@classmethod
def validate_effective_until(cls, v: datetime | None) -> datetime | None:
    """effective_until must be timezone-aware when set."""
    if v is not None and v.tzinfo is None:
        msg = "effective_until must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_scope_consistency

validate_scope_consistency() -> PermissionBundle

scope_entity_id must be None for CATALOG, required for others.

Source code in src/music_attribution/schemas/permissions.py
@model_validator(mode="after")
def validate_scope_consistency(self) -> PermissionBundle:
    """scope_entity_id must be None for CATALOG, required for others."""
    if self.scope == PermissionScopeEnum.CATALOG and self.scope_entity_id is not None:
        msg = "scope_entity_id must be None when scope is CATALOG"
        raise ValueError(msg)
    if self.scope != PermissionScopeEnum.CATALOG and self.scope_entity_id is None:
        msg = f"scope_entity_id must be non-None when scope is {self.scope}"
        raise ValueError(msg)
    return self

validate_effective_range

validate_effective_range() -> PermissionBundle

effective_from must be before effective_until.

Source code in src/music_attribution/schemas/permissions.py
@model_validator(mode="after")
def validate_effective_range(self) -> PermissionBundle:
    """effective_from must be before effective_until."""
    if self.effective_until is not None and self.effective_from >= self.effective_until:
        msg = "effective_from must be before effective_until"
        raise ValueError(msg)
    return self

Feedback

feedback

FeedbackCard boundary object schema (BO-4).

Structured feedback from domain experts (artists, managers, musicologists, producers). Flows from the Chat Interface back into the Attribution Engine for calibration updates. Ref: Zhou et al., 2023 -- FeedbackCards.

The FeedbackCard is the primary reverse-flow boundary object in the five-pipeline architecture, enabling human-in-the-loop calibration. When a domain expert reviews an attribution and provides corrections, these are captured in a FeedbackCard that feeds back into the Attribution Engine for confidence recalibration.

See Also

music_attribution.schemas.attribution : The AttributionRecord being reviewed. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 6.

Correction

Bases: BaseModel

A specific correction to an attribution field.

Represents a single field-level correction proposed by a reviewer. Each correction includes the current (incorrect) value, the proposed corrected value, and the reviewer's confidence in their correction.

ATTRIBUTE DESCRIPTION
field

Name of the field being corrected (e.g., "role", "entity_name", "confidence").

TYPE: str

current_value

The current (incorrect) value of the field as a string.

TYPE: str

corrected_value

The proposed correct value as a string.

TYPE: str

entity_id

UUID of the specific entity this correction applies to, if the correction is entity-specific (e.g., correcting a credit role). None for record-level corrections.

TYPE: UUID or None

confidence_in_correction

The reviewer's confidence that their correction is accurate, range [0.0, 1.0]. Higher values from authoritative reviewers (e.g., the artist) carry more weight during recalibration.

TYPE: float

evidence

Free-text description of the evidence supporting this correction (e.g., "Listed in vinyl liner notes, track 3").

TYPE: str or None

Examples:

>>> correction = Correction(
...     field="role",
...     current_value="PERFORMER",
...     corrected_value="PRODUCER",
...     entity_id=uuid.uuid4(),
...     confidence_in_correction=0.95,
...     evidence="Confirmed in studio session notes",
... )

FeedbackCard

Bases: BaseModel

Structured feedback from a domain expert (BO-4).

The FeedbackCard is the reverse-flow boundary object in the five-pipeline architecture, flowing from the Chat Interface back into the Attribution Engine. It captures structured corrections and an overall assessment from a domain expert who reviewed an AttributionRecord.

A valid FeedbackCard must contain either corrections or free-text (or both). The center_bias_flag is automatically set when the overall assessment falls in the [0.45, 0.55] range, indicating potential anchoring bias toward the midpoint.

ATTRIBUTE DESCRIPTION
schema_version

Semantic version of the FeedbackCard schema. Defaults to "1.0.0".

TYPE: str

feedback_id

Unique identifier for this feedback card. Auto-generated UUIDv4.

TYPE: UUID

attribution_id

UUID of the AttributionRecord being reviewed.

TYPE: UUID

reviewer_id

Identifier of the reviewer (may be an email, username, or external ID).

TYPE: str

reviewer_role

Domain expertise of the reviewer (ARTIST, MANAGER, MUSICOLOGIST, PRODUCER, FAN).

TYPE: ReviewerRoleEnum

attribution_version

Version of the AttributionRecord at the time of review. Minimum 1. Prevents stale feedback on updated records.

TYPE: int

corrections

Specific field-level corrections proposed by the reviewer. May be empty if only free-text feedback is provided.

TYPE: list of Correction

overall_assessment

Reviewer's overall assessment of the attribution quality, range [0.0, 1.0]. 0.0 = completely wrong; 1.0 = perfect.

TYPE: float

center_bias_flag

Automatically set to True if overall_assessment is in [0.45, 0.55], indicating potential anchoring bias.

TYPE: bool

free_text

Free-text feedback for nuances not captured by structured corrections.

TYPE: str or None

evidence_type

Type of evidence supporting the feedback (LINER_NOTES, MEMORY, DOCUMENT, SESSION_NOTES, OTHER).

TYPE: EvidenceTypeEnum

submitted_at

UTC timestamp when the feedback was submitted. Must be timezone-aware.

TYPE: datetime

Examples:

>>> from datetime import datetime, UTC
>>> card = FeedbackCard(
...     attribution_id=uuid.uuid4(),
...     reviewer_id="imogen.heap@example.com",
...     reviewer_role=ReviewerRoleEnum.ARTIST,
...     attribution_version=1,
...     corrections=[
...         Correction(
...             field="role",
...             current_value="PERFORMER",
...             corrected_value="SONGWRITER",
...             confidence_in_correction=1.0,
...         ),
...     ],
...     overall_assessment=0.7,
...     evidence_type=EvidenceTypeEnum.MEMORY,
...     submitted_at=datetime.now(UTC),
... )
See Also

AttributionRecord : The record being reviewed. Correction : Individual field-level corrections.

validate_submitted_at classmethod

validate_submitted_at(v: datetime) -> datetime

submitted_at must be timezone-aware.

Source code in src/music_attribution/schemas/feedback.py
@field_validator("submitted_at")
@classmethod
def validate_submitted_at(cls, v: datetime) -> datetime:
    """submitted_at must be timezone-aware."""
    if v.tzinfo is None:
        msg = "submitted_at must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_content_not_empty

validate_content_not_empty() -> FeedbackCard

A feedback card must have corrections or free_text.

Source code in src/music_attribution/schemas/feedback.py
@model_validator(mode="after")
def validate_content_not_empty(self) -> FeedbackCard:
    """A feedback card must have corrections or free_text."""
    if not self.corrections and self.free_text is None:
        msg = "FeedbackCard must have non-empty corrections or non-None free_text"
        raise ValueError(msg)
    return self

validate_center_bias

validate_center_bias() -> FeedbackCard

Set center_bias_flag if overall_assessment is in [CENTER_BIAS_LOW, CENTER_BIAS_HIGH].

Source code in src/music_attribution/schemas/feedback.py
@model_validator(mode="after")
def validate_center_bias(self) -> FeedbackCard:
    """Set center_bias_flag if overall_assessment is in [CENTER_BIAS_LOW, CENTER_BIAS_HIGH]."""
    if CENTER_BIAS_LOW <= self.overall_assessment <= CENTER_BIAS_HIGH:
        object.__setattr__(self, "center_bias_flag", True)
    return self

Uncertainty

uncertainty

Uncertainty-aware provenance schema models.

Provides decomposed uncertainty tracking for every step of the attribution pipeline. These models are attached to ProvenanceEvent and AttributionRecord objects, enabling transparent communication of why a confidence score is what it is, not just what it is.

Academic grounding:

  • UProp (Duan 2025, arXiv:2506.17419) -- intrinsic/extrinsic decomposition of uncertainty propagation across pipeline steps.
  • Liu (2025, arXiv:2503.15850) -- 4-dimensional uncertainty framework (input, reasoning, parameter, prediction).
  • Yanez (2025, Patterns) -- confidence-weighted source integration for multi-source fusion.
  • Tian (2025, arXiv:2508.06225) -- TH-Score for overconfidence detection in LLM-based systems.
  • Tripathi (2025, arXiv:2506.23464) -- H-Score and Expected Calibration Improvement (ECI) metrics.
  • Zhang (2026, arXiv:2601.15778) -- trajectory-level Holistic Trajectory Calibration (HTC).
See Also

music_attribution.schemas.attribution : Uses these models in provenance. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5 (uncertainty framework).

StepUncertainty

Bases: BaseModel

Per-step uncertainty decomposition (UProp, Duan 2025).

Tracks intrinsic (data noise) and extrinsic (model/pipeline) uncertainty for each processing step in the attribution pipeline, plus an optional 4-dimensional decomposition (Liu 2025). This enables fine-grained analysis of where uncertainty enters and accumulates.

The total_uncertainty must be >= intrinsic_uncertainty (validated at runtime), since total includes both intrinsic and extrinsic components.

ATTRIBUTE DESCRIPTION
step_id

Unique identifier for this pipeline step (e.g., "etl-musicbrainz", "resolution-fuzzy").

TYPE: str

step_name

Human-readable name of the pipeline step (e.g., "MusicBrainz ETL", "Fuzzy String Resolution").

TYPE: str

step_index

Zero-based position of this step in the pipeline sequence.

TYPE: int

stated_confidence

Raw confidence before calibration, range [0.0, 1.0].

TYPE: float

calibrated_confidence

Confidence after post-hoc calibration, range [0.0, 1.0]. May differ significantly from stated_confidence if the step exhibits systematic over- or under-confidence.

TYPE: float

intrinsic_uncertainty

Uncertainty from the input data itself (noise, conflicts, missing fields), range [0.0, 1.0].

TYPE: float

extrinsic_uncertainty

Uncertainty from the model/algorithm (embedding limitations, threshold sensitivity), range [0.0, 1.0].

TYPE: float

total_uncertainty

Combined uncertainty, range [0.0, 1.0]. Must be >= intrinsic_uncertainty.

TYPE: float

input_uncertainty

4-D decomposition: input dimension (Liu 2025), range [0.0, 1.0]. None if 4-D decomposition not computed.

TYPE: float or None

reasoning_uncertainty

4-D decomposition: reasoning dimension, range [0.0, 1.0].

TYPE: float or None

parameter_uncertainty

4-D decomposition: parameter dimension, range [0.0, 1.0].

TYPE: float or None

prediction_uncertainty

4-D decomposition: prediction dimension, range [0.0, 1.0].

TYPE: float or None

uncertainty_sources

Classification of uncertainty sources active in this step (INTRINSIC, EXTRINSIC, ALEATORIC, EPISTEMIC).

TYPE: list of UncertaintySourceEnum

confidence_method

Method used to produce the confidence estimate for this step.

TYPE: ConfidenceMethodEnum

preceding_step_ids

IDs of pipeline steps that feed into this step. Used for UProp uncertainty propagation tracking.

TYPE: list of str

Examples:

>>> step = StepUncertainty(
...     step_id="etl-musicbrainz",
...     step_name="MusicBrainz ETL",
...     step_index=0,
...     stated_confidence=0.87,
...     calibrated_confidence=0.82,
...     intrinsic_uncertainty=0.10,
...     extrinsic_uncertainty=0.05,
...     total_uncertainty=0.15,
...     confidence_method=ConfidenceMethodEnum.SELF_REPORT,
... )

validate_total_ge_intrinsic

validate_total_ge_intrinsic() -> StepUncertainty

total_uncertainty must be >= intrinsic_uncertainty.

Source code in src/music_attribution/schemas/uncertainty.py
@model_validator(mode="after")
def validate_total_ge_intrinsic(self) -> StepUncertainty:
    """total_uncertainty must be >= intrinsic_uncertainty."""
    if self.total_uncertainty < self.intrinsic_uncertainty:
        msg = "total_uncertainty must be >= intrinsic_uncertainty"
        raise ValueError(msg)
    return self

SourceContribution

Bases: BaseModel

Per-source confidence with calibration quality (Yanez 2025).

Tracks how much each data source contributed to the final attribution, with calibration quality indicating how reliable that source's confidence estimates historically are. Sources with higher calibration_quality receive higher weights in the aggregation.

ATTRIBUTE DESCRIPTION
source

The data source (e.g., MUSICBRAINZ, DISCOGS, ARTIST_INPUT).

TYPE: SourceEnum

confidence

This source's confidence in its contribution, range [0.0, 1.0].

TYPE: float

weight

Normalised weight of this source in the final aggregation, range [0.0, 1.0]. Weights across all sources sum to 1.0.

TYPE: float

calibration_quality

Historical calibration quality of this source's confidence estimates, range [0.0, 1.0]. 1.0 = perfectly calibrated (stated confidence matches empirical accuracy).

TYPE: float

record_count

Number of records this source contributed to the attribution. Non-negative.

TYPE: int

is_human

Whether this source is human-provided (e.g., ARTIST_INPUT). Human sources may receive preferential weighting for subjective fields.

TYPE: bool

Examples:

>>> contrib = SourceContribution(
...     source=SourceEnum.MUSICBRAINZ,
...     confidence=0.90,
...     weight=0.45,
...     calibration_quality=0.85,
...     record_count=3,
... )

CalibrationMetadata

Bases: BaseModel

Per-step calibration metrics (Tian 2025 TH-Score).

Records calibration quality for confidence scores, including expected calibration error (ECE), calibration set size, and the method used. Lower ECE indicates better calibration (stated confidence matches empirical accuracy).

ATTRIBUTE DESCRIPTION
expected_calibration_error

Expected Calibration Error (ECE), the average absolute difference between confidence and accuracy across bins. Non-negative. Lower is better; 0.0 = perfectly calibrated.

TYPE: float

calibration_set_size

Number of examples used for calibration. Larger sets give more reliable ECE estimates. Non-negative.

TYPE: int

status

Current calibration status (CALIBRATED, UNCALIBRATED, PENDING).

TYPE: CalibrationStatusEnum

method

Name of the calibration method used (e.g., "platt_scaling", "isotonic_regression", "temperature_scaling"). None if uncalibrated.

TYPE: str or None

Examples:

>>> cal = CalibrationMetadata(
...     expected_calibration_error=0.03,
...     calibration_set_size=500,
...     status=CalibrationStatusEnum.CALIBRATED,
...     method="platt_scaling",
... )

OverconfidenceReport

Bases: BaseModel

Overconfidence detection report (Tripathi 2025 H-Score, ECI).

Detects when stated confidence exceeds actual accuracy, a common failure mode in LLM-based systems. The overconfidence_gap is the primary diagnostic: positive = overconfident, negative = underconfident, zero = perfectly calibrated.

ATTRIBUTE DESCRIPTION
stated_confidence

The system's stated confidence, range [0.0, 1.0].

TYPE: float

actual_accuracy

Empirically measured accuracy on a validation set, range [0.0, 1.0].

TYPE: float

overconfidence_gap

stated_confidence - actual_accuracy. Positive values indicate overconfidence; negative values indicate underconfidence. Can range from -1.0 to 1.0.

TYPE: float

th_score

TH-Score from Tian (2025). Measures hallucination tendency. None if not computed.

TYPE: float or None

h_score

H-Score from Tripathi (2025). Measures honesty of confidence estimates. None if not computed.

TYPE: float or None

eci

Expected Calibration Improvement (ECI) from Tripathi (2025). How much calibration could be improved. None if not computed.

TYPE: float or None

Examples:

>>> report = OverconfidenceReport(
...     stated_confidence=0.92,
...     actual_accuracy=0.85,
...     overconfidence_gap=0.07,
...     th_score=0.12,
... )

TrajectoryCalibration

Bases: BaseModel

Trajectory-level calibration (Zhang 2026, HTC).

Tracks confidence dynamics across the full pipeline, treating the sequence of confidence scores at each step as a trajectory. The trajectory shape (increasing, decreasing, stable, volatile) is a powerful signal for calibration: volatile trajectories often indicate unreliable final confidence.

The optional htc_feature_vector is a 48-dimensional feature vector extracted from the trajectory for use with the HTC calibration method.

ATTRIBUTE DESCRIPTION
trajectory_id

Unique identifier for this trajectory (typically matches the attribution record ID).

TYPE: str

step_count

Number of pipeline steps in the trajectory. Minimum 1.

TYPE: int

confidence_trend

Classified trend of confidence across steps (INCREASING, DECREASING, STABLE, VOLATILE).

TYPE: ConfidenceTrendEnum

initial_confidence

Confidence at the first pipeline step, range [0.0, 1.0].

TYPE: float

final_confidence

Confidence at the last pipeline step, range [0.0, 1.0].

TYPE: float

htc_feature_vector

48-dimensional feature vector for HTC calibration (Zhang 2026). Must be exactly length 48 when provided. None if HTC is not used.

TYPE: list of float or None

Examples:

>>> traj = TrajectoryCalibration(
...     trajectory_id="attr-12345",
...     step_count=4,
...     confidence_trend=ConfidenceTrendEnum.INCREASING,
...     initial_confidence=0.65,
...     final_confidence=0.92,
... )

validate_htc_vector_length classmethod

validate_htc_vector_length(
    v: list[float] | None,
) -> list[float] | None

HTC feature vector must be length 48 when provided.

Source code in src/music_attribution/schemas/uncertainty.py
@field_validator("htc_feature_vector")
@classmethod
def validate_htc_vector_length(cls, v: list[float] | None) -> list[float] | None:
    """HTC feature vector must be length 48 when provided."""
    if v is not None and len(v) != 48:
        msg = "htc_feature_vector must be length 48"
        raise ValueError(msg)
    return v

UncertaintyAwareProvenance

Bases: BaseModel

Top-level uncertainty summary for an AttributionRecord.

Aggregates step-level uncertainties, source contributions, calibration metadata, overconfidence diagnostics, and trajectory calibration into a single summary. Attached to each AttributionRecord as uncertainty_summary.

This is the primary structure for answering "why is the confidence what it is?" -- enabling transparent uncertainty communication to end users and downstream systems.

ATTRIBUTE DESCRIPTION
steps

Per-step uncertainty decomposition for each pipeline step. Ordered by step_index.

TYPE: list of StepUncertainty

source_contributions

Per-source confidence and weight breakdown. Shows how much each data source influenced the final score.

TYPE: list of SourceContribution

calibration

Overall calibration metrics for the record's confidence score. None if calibration has not been performed.

TYPE: CalibrationMetadata or None

overconfidence

Overconfidence diagnostic report. None if not computed.

TYPE: OverconfidenceReport or None

trajectory

Trajectory-level calibration data (HTC). None if trajectory analysis was not performed.

TYPE: TrajectoryCalibration or None

total_uncertainty

Aggregated total uncertainty across all steps, range [0.0, 1.0]. Defaults to 0.0.

TYPE: float

dominant_uncertainty_source

The primary source of uncertainty in this record (INTRINSIC, EXTRINSIC, ALEATORIC, or EPISTEMIC). None if not determined.

TYPE: UncertaintySourceEnum or None

Examples:

>>> summary = UncertaintyAwareProvenance(
...     total_uncertainty=0.18,
...     dominant_uncertainty_source=UncertaintySourceEnum.EPISTEMIC,
... )
See Also

AttributionRecord : Parent record containing this summary. StepUncertainty : Per-step decomposition detail.