Schemas¶
Pydantic boundary objects that define clean data flow boundaries between pipelines.
Enums¶
enums
¶
Shared enumerations for boundary object schemas.
All enums are StrEnum for JSON-friendly serialization, ensuring that
values survive round-trips through JSON/YAML without requiring custom
serializers. Domain-extensible enums (EntityTypeEnum,
RelationshipTypeEnum, PermissionTypeEnum) can be extended via
domain overlay YAML in a future phase.
This module provides the single source of truth for all categorical values used across the five-pipeline architecture (ETL, Entity Resolution, Attribution Engine, API/MCP Server, Chat Interface). Enums are grouped by domain:
- Core pipeline -- source identification, entity types, resolution
- Attribution -- credit roles, assurance levels, provenance events
- Permissions -- MCP Permission Patchbay types, values, scopes
- Uncertainty -- uncertainty decomposition taxonomy, calibration
- Commercial landscape -- training attribution, watermarking, revenue
- Regulatory/compliance -- EU AI Act, ISO 42001, DSM Directive
See Also
Teikari, P. (2026). Music Attribution with Transparent Confidence. SSRN No. 6109087 -- sections 5-7 for assurance levels and permission patchbay design.
SourceEnum
¶
Bases: StrEnum
Data source identifiers for the ETL pipeline.
Each value represents an external data source from which music metadata is ingested. The source identity is preserved throughout the entire pipeline so that downstream confidence scoring can apply per-source reliability weights. See Teikari (2026), section 5.
| ATTRIBUTE | DESCRIPTION |
|---|---|
MUSICBRAINZ |
MusicBrainz open database. Community-curated, high coverage for Western popular music. Provides MBIDs.
TYPE:
|
DISCOGS |
Discogs marketplace/database. Strong on vinyl releases, credits, and label information. Provides Discogs numeric IDs.
TYPE:
|
ACOUSTID |
AcoustID audio fingerprint service. Matches audio signals to MusicBrainz recordings via Chromaprint fingerprints.
TYPE:
|
ARTIST_INPUT |
Direct input from the artist or their representative. Highest authority for creative intent, but unverifiable by third parties.
TYPE:
|
FILE_METADATA |
Embedded file metadata (ID3 tags, Vorbis comments, etc.) extracted from the audio file itself. Quality varies widely.
TYPE:
|
Examples:
>>> SourceEnum.MUSICBRAINZ
<SourceEnum.MUSICBRAINZ: 'MUSICBRAINZ'>
>>> SourceEnum("ARTIST_INPUT")
<SourceEnum.ARTIST_INPUT: 'ARTIST_INPUT'>
EntityTypeEnum
¶
Bases: StrEnum
Music entity types within the knowledge graph.
Follows the MusicBrainz entity model with extensions for credits. Entity resolution produces a unified graph where each node is typed by one of these values.
| ATTRIBUTE | DESCRIPTION |
|---|---|
RECORDING |
A specific audio recording (a unique performance captured in a studio or live). Identified by ISRC.
TYPE:
|
WORK |
An abstract musical composition independent of any recording. Identified by ISWC. A single work may have many recordings.
TYPE:
|
ARTIST |
A person or group who creates or performs music. Identified by ISNI or IPI.
TYPE:
|
RELEASE |
A packaged product (album, single, EP) containing one or more recordings.
TYPE:
|
LABEL |
A record label or publishing entity that owns or distributes releases.
TYPE:
|
CREDIT |
A specific attribution credit linking an artist to a recording or work in a particular role (e.g., producer, songwriter).
TYPE:
|
Examples:
RelationshipTypeEnum
¶
Bases: StrEnum
Relationship types between music entities in the knowledge graph.
Edges in the entity graph are typed by these values. Each relationship
connects a source entity (typically an artist) to a target entity
(typically a recording or work). Relationship types map loosely to
CreditRoleEnum but represent graph edges rather than attribution
line items.
| ATTRIBUTE | DESCRIPTION |
|---|---|
PERFORMED |
Artist performed on a recording (vocalist, instrumentalist).
TYPE:
|
WROTE |
Artist wrote the underlying composition (songwriter, composer).
TYPE:
|
PRODUCED |
Artist produced the recording (creative/technical oversight).
TYPE:
|
ENGINEERED |
Artist served as recording engineer.
TYPE:
|
ARRANGED |
Artist arranged the composition for a specific performance.
TYPE:
|
MASTERED |
Artist mastered the final audio (mastering engineer).
TYPE:
|
MIXED |
Artist mixed the recording (mixing engineer).
TYPE:
|
FEATURED |
Artist is a featured guest on the recording.
TYPE:
|
SAMPLED |
Recording contains a sample from the target recording.
TYPE:
|
REMIXED |
Recording is a remix of the target recording.
TYPE:
|
ResolutionMethodEnum
¶
Bases: StrEnum
Entity resolution methods used to merge NormalizedRecords.
The Entity Resolution pipeline may use one or more of these methods
to determine whether two NormalizedRecords refer to the same real-world
entity. Methods are ordered roughly by computational cost and
reliability. The chosen method is recorded on each ResolvedEntity
for provenance tracing.
| ATTRIBUTE | DESCRIPTION |
|---|---|
EXACT_ID |
Exact match on a standard identifier (ISRC, ISWC, ISNI, MBID). Highest confidence, lowest cost.
TYPE:
|
FUZZY_STRING |
Fuzzy string matching on names/titles (e.g., Levenshtein, Jaro-Winkler). Handles typos and transliterations.
TYPE:
|
EMBEDDING |
Semantic similarity via vector embeddings (e.g., sentence transformers). Handles paraphrases and abbreviations.
TYPE:
|
GRAPH |
Graph-based resolution using relationship structure (e.g., Splink). Exploits co-occurrence patterns in the entity graph.
TYPE:
|
LLM |
LLM-assisted resolution for ambiguous cases. Most expensive, used as a fallback when other methods are inconclusive.
TYPE:
|
MANUAL |
Human-in-the-loop resolution by a domain expert. Triggered when automated methods produce low-confidence matches.
TYPE:
|
AssuranceLevelEnum
¶
Bases: StrEnum
Tiered provenance classification (A0--A3).
Maps to the assurance framework from Teikari (2026), section 6. Higher levels require stronger evidence chains. The assurance level determines how much trust downstream consumers can place in an attribution record.
Ordered by verification depth: LEVEL_0 < LEVEL_1 <
LEVEL_2 < LEVEL_3.
| ATTRIBUTE | DESCRIPTION |
|---|---|
LEVEL_0 |
No provenance data. Self-declared or unknown origin. Corresponds to A0 in the manuscript.
TYPE:
|
LEVEL_1 |
Single source. Documented but not independently verified. Corresponds to A1. Typical for file-metadata-only records.
TYPE:
|
LEVEL_2 |
Multiple sources agree. Cross-referenced and corroborated across at least two independent data sources. Corresponds to A2.
TYPE:
|
LEVEL_3 |
Artist-verified or authority-verified. Highest assurance level. Requires explicit confirmation from the rights holder or an authoritative registry (e.g., ISNI). Corresponds to A3.
TYPE:
|
Examples:
>>> AssuranceLevelEnum.LEVEL_3
<AssuranceLevelEnum.LEVEL_3: 'LEVEL_3'>
>>> AssuranceLevelEnum("LEVEL_0") < AssuranceLevelEnum("LEVEL_3")
True
ConflictSeverityEnum
¶
Bases: StrEnum
Conflict severity levels between data sources.
When entity resolution encounters disagreements between sources for the same field (e.g., different release dates or artist names), the conflict is assigned a severity level that determines whether it can be auto-resolved or requires human review.
| ATTRIBUTE | DESCRIPTION |
|---|---|
LOW |
Minor discrepancy, auto-resolvable. Example: trailing whitespace differences in artist names.
TYPE:
|
MEDIUM |
Significant discrepancy requiring attention but not blocking. Example: different release dates within the same year.
TYPE:
|
HIGH |
Major disagreement likely indicating a data quality issue. Example: different artist names for the same ISRC.
TYPE:
|
CRITICAL |
Fundamental conflict that blocks attribution. Example:
contradictory songwriter credits from authoritative sources.
Always triggers
TYPE:
|
CreditRoleEnum
¶
Bases: StrEnum
Credit roles for music attribution.
These roles appear in Credit objects within AttributionRecord
and as prediction targets in conformal prediction sets. The taxonomy
covers the most common roles found across MusicBrainz, Discogs, and
industry metadata standards (DDEX, CWR).
| ATTRIBUTE | DESCRIPTION |
|---|---|
PERFORMER |
Primary performer (vocalist or lead instrumentalist).
TYPE:
|
SONGWRITER |
Songwriter (both music and lyrics). Use
TYPE:
|
COMPOSER |
Composed the music (melody, harmony, structure).
TYPE:
|
LYRICIST |
Wrote the lyrics/text.
TYPE:
|
PRODUCER |
Music producer (creative and/or technical oversight of the recording process).
TYPE:
|
ENGINEER |
Recording/tracking engineer.
TYPE:
|
MIXING_ENGINEER |
Mixing engineer (balance, EQ, effects in post-production).
TYPE:
|
MASTERING_ENGINEER |
Mastering engineer (final audio processing for distribution).
TYPE:
|
ARRANGER |
Arranged the composition for a specific performance context.
TYPE:
|
SESSION_MUSICIAN |
Session musician (hired instrumentalist, not a band member).
TYPE:
|
FEATURED_ARTIST |
Featured guest artist on the recording.
TYPE:
|
CONDUCTOR |
Orchestra or ensemble conductor.
TYPE:
|
DJ |
DJ (for electronic music, turntablism, or mix compilations).
TYPE:
|
REMIXER |
Created a remix of the original recording.
TYPE:
|
ProvenanceEventTypeEnum
¶
Bases: StrEnum
Provenance event types for the attribution audit trail.
Every ProvenanceEvent in an AttributionRecord is typed by one
of these values. Together they form an immutable audit chain showing
how an attribution was constructed and refined over time.
| ATTRIBUTE | DESCRIPTION |
|---|---|
FETCH |
Data fetched from an external source (ETL pipeline). Records which source was queried and how many records were returned.
TYPE:
|
RESOLVE |
Entity resolution step. Records the method used and the input/output record counts.
TYPE:
|
SCORE |
Confidence scoring/calibration step. Records the previous and new confidence values and the scoring method applied.
TYPE:
|
REVIEW |
Human review event. Records who reviewed, which feedback card was applied, and how many corrections were made.
TYPE:
|
UPDATE |
Record update event. Records version bump, fields changed, and what triggered the update.
TYPE:
|
FEEDBACK |
Feedback integration event. Records that a
TYPE:
|
ReviewerRoleEnum
¶
Bases: StrEnum
Feedback reviewer roles for the FeedbackCard system.
Identifies the domain expertise of the person providing feedback. The reviewer role affects how feedback is weighted during calibration updates -- artist-provided corrections carry higher authority than fan suggestions.
| ATTRIBUTE | DESCRIPTION |
|---|---|
ARTIST |
The artist themselves (or a confirmed representative). Highest authority for creative intent.
TYPE:
|
MANAGER |
Artist manager or business representative. Authority for contractual and commercial metadata.
TYPE:
|
MUSICOLOGIST |
Academic musicologist or music information retrieval expert. Authority for compositional analysis and historical context.
TYPE:
|
PRODUCER |
Music producer who worked on the recording. Authority for session credits and technical contributions.
TYPE:
|
FAN |
Community member / fan contributor. Valuable for crowd-sourced corrections but requires corroboration. Lowest weight.
TYPE:
|
EvidenceTypeEnum
¶
Bases: StrEnum
Evidence types supporting feedback corrections.
When a reviewer submits a FeedbackCard with corrections, they
must indicate what evidence supports the correction. Evidence type
affects the credibility weighting of the correction during
calibration updates.
| ATTRIBUTE | DESCRIPTION |
|---|---|
LINER_NOTES |
Physical or digital liner notes from the release packaging. Strong documentary evidence.
TYPE:
|
MEMORY |
Personal recollection of the reviewer (e.g., artist remembering who played on a session). Subject to recall bias.
TYPE:
|
DOCUMENT |
Contractual or legal document (e.g., publishing agreement, session contract). Strongest documentary evidence.
TYPE:
|
SESSION_NOTES |
Studio session notes or recording logs. Strong evidence for engineering and performance credits.
TYPE:
|
OTHER |
Other evidence type not covered above. Requires free-text
explanation in the
TYPE:
|
PermissionTypeEnum
¶
Bases: StrEnum
Permission types for the MCP Permission Patchbay.
Defines the universe of machine-readable permission queries that AI
platforms and other consumers can issue via MCP. The taxonomy is
hierarchical: AI_TRAINING is a broad category with
AI_TRAINING_COMPOSITION, AI_TRAINING_RECORDING, and
AI_TRAINING_STYLE as finer-grained sub-permissions.
See Teikari (2026), section 7, for the Permission Patchbay design.
| ATTRIBUTE | DESCRIPTION |
|---|---|
STREAM |
Permission to stream the recording.
TYPE:
|
DOWNLOAD |
Permission to download the recording for offline use.
TYPE:
|
SYNC_LICENSE |
Synchronisation license (music paired with visual media).
TYPE:
|
AI_TRAINING |
Broad permission for AI model training on any aspect of the work.
TYPE:
|
AI_TRAINING_COMPOSITION |
AI training specifically on the compositional elements (melody, harmony, structure).
TYPE:
|
AI_TRAINING_RECORDING |
AI training specifically on the recording (audio signal, mix, production qualities).
TYPE:
|
AI_TRAINING_STYLE |
AI training on stylistic elements (timbre, groove, aesthetic).
TYPE:
|
DATASET_INCLUSION |
Inclusion in a published research or training dataset.
TYPE:
|
VOICE_CLONING |
Use of vocal performance for voice cloning / synthesis.
TYPE:
|
STYLE_LEARNING |
Learning artistic style for generative imitation.
TYPE:
|
LYRICS_IN_CHATBOTS |
Reproduction of lyrics in chatbot / LLM responses.
TYPE:
|
COVER_VERSIONS |
Permission to create and distribute cover versions.
TYPE:
|
REMIX |
Permission to create remixes of the recording.
TYPE:
|
SAMPLE |
Permission to sample portions of the recording.
TYPE:
|
DERIVATIVE_WORK |
Broad permission for any derivative work not covered above.
TYPE:
|
PermissionValueEnum
¶
Bases: StrEnum
Permission response values for MCP consent queries.
Each permission entry in a PermissionBundle resolves to one of
these values. The values form a spectrum from unconditional denial
to unconditional allowance, with conditional variants in between.
| ATTRIBUTE | DESCRIPTION |
|---|---|
ALLOW |
Unconditional permission granted.
TYPE:
|
DENY |
Permission explicitly denied. No exceptions.
TYPE:
|
ASK |
Permission not pre-determined; the requester must contact the rights holder for case-by-case approval.
TYPE:
|
ALLOW_WITH_ATTRIBUTION |
Permission granted on condition that proper attribution is
included. Requires
TYPE:
|
ALLOW_WITH_ROYALTY |
Permission granted on condition of royalty payment. Requires
TYPE:
|
PermissionScopeEnum
¶
Bases: StrEnum
Permission scope levels defining granularity of consent.
Permissions can be set at different levels of granularity, from an entire catalog down to a single work. Broader scopes act as defaults that can be overridden by narrower scopes.
| ATTRIBUTE | DESCRIPTION |
|---|---|
CATALOG |
Applies to the entire catalog of the rights holder. Broadest
scope. When scope is CATALOG,
TYPE:
|
RELEASE |
Applies to a specific release (album, EP, single). Requires
TYPE:
|
RECORDING |
Applies to a specific recording. Requires
TYPE:
|
WORK |
Applies to a specific musical work (composition). Requires
TYPE:
|
DelegationRoleEnum
¶
Bases: StrEnum
Delegation chain roles in the permission hierarchy.
A PermissionBundle may include a delegation chain showing who
granted permission authority to whom. This enables audit trails
for permission provenance (e.g., artist -> manager -> label ->
distributor).
| ATTRIBUTE | DESCRIPTION |
|---|---|
OWNER |
Original rights holder (typically the artist or songwriter). Root of the delegation chain.
TYPE:
|
MANAGER |
Artist manager or business representative acting on behalf of the owner.
TYPE:
|
LABEL |
Record label holding master recording rights via contract.
TYPE:
|
DISTRIBUTOR |
Digital distributor handling platform delivery. Typically the outermost link in the delegation chain.
TYPE:
|
PipelineFeedbackTypeEnum
¶
Bases: StrEnum
Pipeline feedback signal types for continuous improvement.
These are reverse-flow signals between pipelines, enabling the system to self-correct. For example, the Attribution Engine can signal back to Entity Resolution that its confidence estimates were miscalibrated, or the API layer can signal a dispute.
| ATTRIBUTE | DESCRIPTION |
|---|---|
REFETCH |
Signal from Entity Resolution to ETL: "data from source X is consistently wrong or stale, re-fetch from the source."
TYPE:
|
RECALIBRATE |
Signal from Attribution Engine to Entity Resolution: "resolution confidence was miscalibrated; predicted confidence differs significantly from actual accuracy."
TYPE:
|
DISPUTE |
Signal from API/Chat to Attribution Engine: "a user or rights holder has disputed this attribution; re-evaluate."
TYPE:
|
STALE |
Signal from any pipeline: "this record has not been refreshed within its expected freshness window."
TYPE:
|
UncertaintySourceEnum
¶
Bases: StrEnum
Uncertainty source taxonomy based on UProp (Duan 2025).
Classifies the origin of uncertainty in confidence estimates. The intrinsic/extrinsic decomposition (Duan 2025, arXiv:2506.17419) is the primary axis; aleatoric/epistemic is the classical secondary axis for compatibility with standard ML uncertainty literature.
| ATTRIBUTE | DESCRIPTION |
|---|---|
INTRINSIC |
Intrinsic uncertainty arising from noise in the input data itself (e.g., conflicting metadata across sources, ambiguous artist names).
TYPE:
|
EXTRINSIC |
Extrinsic uncertainty arising from the model or pipeline (e.g., embedding model limitations, resolution algorithm edge cases).
TYPE:
|
ALEATORIC |
Irreducible uncertainty inherent in the data-generating process. Cannot be reduced by collecting more data.
TYPE:
|
EPISTEMIC |
Reducible uncertainty due to limited knowledge or data. Can be reduced by collecting more evidence or better models.
TYPE:
|
UncertaintyDimensionEnum
¶
Bases: StrEnum
4-dimensional uncertainty framework (Liu 2025, arXiv:2503.15850).
Orthogonal to the intrinsic/extrinsic decomposition, this framework decomposes uncertainty along the information processing pipeline: from input through reasoning to final prediction.
| ATTRIBUTE | DESCRIPTION |
|---|---|
INPUT |
Uncertainty in the input data (noise, missing fields, ambiguity).
Maps to
TYPE:
|
REASONING |
Uncertainty in the reasoning/inference process (e.g., entity
resolution logic, LLM chain-of-thought). Maps to
TYPE:
|
PARAMETER |
Uncertainty in model parameters (e.g., embedding model weights,
fuzzy matching thresholds). Maps to
TYPE:
|
PREDICTION |
Uncertainty in the final prediction/output (e.g., the confidence
score itself). Maps to
TYPE:
|
ConfidenceMethodEnum
¶
Bases: StrEnum
Methods used to produce confidence scores.
Each StepUncertainty records which method was used to generate
its confidence estimate. Methods vary in cost, reliability, and
calibration quality. See Teikari (2026), section 5, for the
confidence scoring framework.
| ATTRIBUTE | DESCRIPTION |
|---|---|
SELF_REPORT |
Source-reported confidence (e.g., MusicBrainz data quality rating). Cheapest but least calibrated.
TYPE:
|
MULTI_SAMPLE |
Multiple-sample consistency (e.g., querying an LLM multiple times and measuring agreement).
TYPE:
|
LOGPROB |
Token log-probability from an LLM. Fast but requires logprob API access.
TYPE:
|
ENSEMBLE |
Ensemble of multiple models or methods. More expensive but better calibrated than single-model approaches.
TYPE:
|
CONFORMAL |
Conformal prediction providing coverage guarantees. Produces prediction sets rather than point estimates.
TYPE:
|
SOURCE_WEIGHTED |
Weighted average across data sources based on historical reliability (Yanez 2025 approach).
TYPE:
|
HUMAN_RATED |
Human expert rating. Highest authority but most expensive and slowest.
TYPE:
|
HTC |
Holistic Trajectory Calibration (Zhang 2026, arXiv:2601.15778). Uses trajectory-level features across the full pipeline for calibration.
TYPE:
|
CalibrationStatusEnum
¶
Bases: StrEnum
Calibration status of a confidence score.
Indicates whether a confidence score has been post-hoc calibrated (e.g., via Platt scaling or isotonic regression) to ensure that stated confidence matches empirical accuracy.
| ATTRIBUTE | DESCRIPTION |
|---|---|
CALIBRATED |
Score has been calibrated against a held-out calibration set.
TYPE:
|
UNCALIBRATED |
Score is raw/uncalibrated. May exhibit over- or under-confidence.
TYPE:
|
PENDING |
Calibration is pending (insufficient calibration data collected so far). Score should be treated as uncalibrated.
TYPE:
|
ConfidenceTrendEnum
¶
Bases: StrEnum
Confidence trend across pipeline steps (Zhang 2026).
Characterises the trajectory of confidence scores as a record
passes through the pipeline. Used by TrajectoryCalibration
for HTC-based calibration. See Zhang (2026, arXiv:2601.15778).
| ATTRIBUTE | DESCRIPTION |
|---|---|
INCREASING |
Confidence monotonically increases across steps. Typical when multiple corroborating sources are found.
TYPE:
|
DECREASING |
Confidence monotonically decreases. May indicate conflicting evidence discovered during resolution.
TYPE:
|
STABLE |
Confidence remains approximately constant. Typical for records with strong initial evidence (e.g., exact ID match).
TYPE:
|
VOLATILE |
Confidence oscillates across steps. May indicate unstable
resolution or contradictory evidence. Often triggers
TYPE:
|
AttributionMethodEnum
¶
Bases: StrEnum
Training data attribution (TDA) methods.
Future-readiness stubs for commercial landscape parity with Musical AI, Sureel, ProRata, and Sony's influence-function approach. These methods attempt to quantify how much a specific training example influenced a generative model's output.
| ATTRIBUTE | DESCRIPTION |
|---|---|
TRAINING_TIME_INFLUENCE |
Influence measured at training time (e.g., data Shapley values, TracIn). Requires access to training checkpoints.
TYPE:
|
UNLEARNING_BASED |
Influence measured via machine unlearning (retrain-without and compare). Expensive but theoretically sound.
TYPE:
|
INFLUENCE_FUNCTIONS |
Classical influence functions (Koh & Liang 2017). Approximates leave-one-out retraining via Hessian-vector products.
TYPE:
|
EMBEDDING_SIMILARITY |
Cosine similarity in embedding space between source and generated content. Cheapest but least rigorous.
TYPE:
|
WATERMARK_DETECTION |
Detection of embedded watermarks in generated content that trace back to training data (e.g., SynthID, AudioSeal).
TYPE:
|
INFERENCE_TIME_CONDITIONING |
Attribution via inference-time conditioning or prompting (e.g., Musical AI's approach of conditioning generation on a known work).
TYPE:
|
RightsTypeEnum
¶
Bases: StrEnum
Music rights types distinguishing compositional vs recording rights.
Future-readiness stub. In music licensing, rights are split between the composition (publishing) side and the recording (master) side. This distinction is critical for AI training attribution: a model may learn from the composition, the recording, or both.
Based on Sureel patent and LANDR rights management approaches.
| ATTRIBUTE | DESCRIPTION |
|---|---|
MASTER_RECORDING |
Rights in the specific audio recording (sound recording copyright). Typically held by the label or artist.
TYPE:
|
COMPOSITION_PUBLISHING |
Rights in the underlying composition (musical work copyright). Typically held by the publisher or songwriter.
TYPE:
|
PERFORMANCE |
Performance rights (public performance, broadcast). Managed by PROs (ASCAP, BMI, PRS, GEMA, etc.).
TYPE:
|
MECHANICAL |
Mechanical reproduction rights (physical copies, downloads, interactive streams).
TYPE:
|
SYNC |
Synchronisation rights (pairing music with visual media).
TYPE:
|
MediaTypeEnum
¶
Bases: StrEnum
Multi-modal attribution media types.
Future-readiness stub for multi-modal training data attribution. While this scaffold focuses on audio, the Sureel and ProRata approaches are modality-agnostic and support cross-modal attribution.
| ATTRIBUTE | DESCRIPTION |
|---|---|
AUDIO |
Audio content (waveform, spectrogram). Primary modality for this scaffold.
TYPE:
|
IMAGE |
Image content (album art, spectrograms as images).
TYPE:
|
VIDEO |
Video content (music videos, live performances).
TYPE:
|
TEXT |
Text content (lyrics, liner notes, reviews).
TYPE:
|
SYMBOLIC_MUSIC |
Symbolic music representations (MIDI, MusicXML, ABC notation).
TYPE:
|
MULTIMODAL |
Content spanning multiple modalities simultaneously.
TYPE:
|
CertificationTypeEnum
¶
Bases: StrEnum
External certification and compliance attestation types.
Future-readiness stub for third-party certifications that validate
an AI system's training data practices. These certifications are
attached to ComplianceAttestation records.
| ATTRIBUTE | DESCRIPTION |
|---|---|
FAIRLY_TRAINED_LICENSED |
Fairly Trained certification indicating all training data was licensed or in the public domain.
TYPE:
|
C2PA_PROVENANCE |
C2PA (Coalition for Content Provenance and Authenticity) provenance manifest attached to generated content.
TYPE:
|
EU_AI_ACT_COMPLIANT |
Self-declared or audited compliance with EU AI Act requirements for general-purpose AI (GPAI) models.
TYPE:
|
CMO_APPROVED |
Approved by a Collective Management Organisation (CMO) such as GEMA, PRS, or ASCAP for training data usage.
TYPE:
|
WatermarkTypeEnum
¶
Bases: StrEnum
Audio watermark types for provenance tracking.
Future-readiness stub for audio watermarking systems that embed imperceptible identifiers in audio signals. Watermarks enable post-hoc attribution of AI-generated content back to training data or generation source.
| ATTRIBUTE | DESCRIPTION |
|---|---|
SYNTHID |
Google DeepMind's SynthID audio watermarking. Embeds identifiers in spectrogram space.
TYPE:
|
AUDIOSEAL |
Meta's AudioSeal. Localised audio watermarking with detector that identifies watermarked segments.
TYPE:
|
WAVMARK |
WavMark academic watermarking approach. Embeds in the waveform domain.
TYPE:
|
DIGIMARC |
Digimarc commercial watermarking. Used in broadcast monitoring and content identification.
TYPE:
|
RevenueModelEnum
¶
Bases: StrEnum
Revenue sharing models for AI-generated music attribution.
Future-readiness stub for commercial revenue distribution models. Different platforms use different approaches to compensate rights holders whose works contributed to AI training.
| ATTRIBUTE | DESCRIPTION |
|---|---|
FLAT_FEE_UPFRONT |
One-time flat fee paid for training data licensing (e.g., LANDR model for stem packs).
TYPE:
|
PRO_RATA_MONTHLY |
Monthly pro-rata distribution based on catalog size or usage (e.g., streaming royalty model applied to AI training).
TYPE:
|
PER_GENERATION |
Payment per generation event that uses the rights holder's contribution (e.g., Kits AI voice model usage).
TYPE:
|
INFLUENCE_BASED |
Payment proportional to measured influence on generated output (e.g., Musical AI / Sureel approach using TDA methods).
TYPE:
|
RegulatoryFrameworkEnum
¶
Bases: StrEnum
Applicable regulatory and governance frameworks.
ISO 42001 defines internal AI governance roles; EU AI Act defines supply chain liability actors. They have zero terminological overlap and must be tracked separately. See Teikari (2026), section 8, for the regulatory mapping.
| ATTRIBUTE | DESCRIPTION |
|---|---|
ISO_42001 |
ISO/IEC 42001 AI Management System standard. Defines internal governance roles (Top Management, AI System Owner, Internal Audit).
TYPE:
|
EU_AI_ACT |
EU Artificial Intelligence Act (Regulation 2024/1689). Defines risk categories and obligations for AI system providers/deployers.
TYPE:
|
GPAI_CODE_OF_PRACTICE |
General-Purpose AI Model Code of Practice (July 2025). Specifies transparency and copyright compliance requirements for GPAI models.
TYPE:
|
DSM_DIRECTIVE |
EU Digital Single Market Directive (2019/790). Art. 3-4 govern text-and-data mining exceptions and opt-out mechanisms.
TYPE:
|
ESPR_DPP |
EU Ecodesign for Sustainable Products Regulation / Digital Product Passport. Cross-domain provenance framework.
TYPE:
|
GDPR |
EU General Data Protection Regulation. Relevant when attribution records contain personal data (artist identities, reviewer info).
TYPE:
|
ComplianceActorEnum
¶
Bases: StrEnum
EU AI Act supply chain actors (Art. 3).
These are distinct from ISO 42001 internal governance roles (Top Management, AI System Owner, Internal Audit). An organization may simultaneously hold multiple actor classifications across different AI systems.
| ATTRIBUTE | DESCRIPTION |
|---|---|
PROVIDER |
Entity that develops or has an AI system developed and places it on the market or puts it into service (Art. 3(3)).
TYPE:
|
DEPLOYER |
Entity that uses an AI system under its authority (Art. 3(4)). The music platform using the attribution system.
TYPE:
|
AUTHORISED_REPRESENTATIVE |
Entity established in the EU mandated by a non-EU provider to act on their behalf (Art. 3(5)).
TYPE:
|
IMPORTER |
Entity established in the EU that places an AI system from a third country on the EU market (Art. 3(6)).
TYPE:
|
DISTRIBUTOR |
Entity in the supply chain that makes an AI system available on the EU market (Art. 3(7)).
TYPE:
|
PRODUCT_MANUFACTURER |
Manufacturer of a product that integrates an AI system as a safety component (Art. 3(8)).
TYPE:
|
TdmReservationMethodEnum
¶
Bases: StrEnum
Text-and-data-mining rights reservation methods.
Under EU DSM Directive Art. 4, copyright holders can opt out of TDM via machine-readable reservation. The GPAI Code of Practice (July 2025) requires providers to respect robots.txt and emerging protocols. Music has a structural gap: robots.txt is web-only and does not cover audio content accessed via APIs or streaming platforms.
See Teikari (2026), section 7, for the music-specific gap analysis.
| ATTRIBUTE | DESCRIPTION |
|---|---|
ROBOTS_TXT |
Standard robots.txt file on web servers. Web-only; does not cover audio files served via APIs or streaming platforms.
TYPE:
|
LLMS_TXT |
Emerging llms.txt protocol for specifying LLM training permissions at the domain level.
TYPE:
|
MACHINE_READABLE_TAG |
HTML meta tags or HTTP headers expressing TDM reservation
(e.g.,
TYPE:
|
RIGHTS_RESERVATION_API |
Programmatic API for querying rights reservation status. More flexible than static files but requires infrastructure.
TYPE:
|
MCP_PERMISSION_QUERY |
Model Context Protocol permission query. The approach advocated by this scaffold: machine-readable consent queries via MCP tools.
TYPE:
|
Normalized Record (ETL Output)¶
normalized
¶
NormalizedRecord boundary object schema (BO-1).
Output of the Data Engineering (ETL) pipeline. A single music entity
normalized from one external source. Multiple NormalizedRecord instances
for the same real-world entity (from different sources) feed into the
Entity Resolution pipeline, which merges them into a ResolvedEntity.
This module defines the first boundary object in the five-pipeline
architecture. All ETL extractors -- regardless of source format -- produce
NormalizedRecord instances with a common schema, enabling uniform
downstream processing.
See Also
music_attribution.schemas.resolved : The next boundary object in the pipeline. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5.
IdentifierBundle
¶
Bases: BaseModel
Standard music industry identifiers bundle.
Collects all known standard identifiers for a music entity. At the A2/A3 assurance levels, at least one identifier must be present for machine-sourced records (MusicBrainz, Discogs, AcoustID). These identifiers are the primary key for exact-match entity resolution.
| ATTRIBUTE | DESCRIPTION |
|---|---|
isrc |
International Standard Recording Code. 12-character alphanumeric
code uniquely identifying a specific recording (e.g.,
TYPE:
|
iswc |
International Standard Musical Work Code. Identifies the
underlying composition (e.g.,
TYPE:
|
isni |
International Standard Name Identifier. 16-digit identifier for
public identities of parties (e.g.,
TYPE:
|
ipi |
Interested Party Information code. 9-11 digit code identifying rights holders in collecting society databases.
TYPE:
|
mbid |
MusicBrainz Identifier. UUID assigned by MusicBrainz to any entity in their database.
TYPE:
|
discogs_id |
Discogs numeric entity ID. Integer identifier in the Discogs database.
TYPE:
|
acoustid |
AcoustID identifier. UUID derived from audio fingerprint (Chromaprint) matching.
TYPE:
|
Examples:
>>> bundle = IdentifierBundle(
... isrc="GBAYE0601498",
... mbid="a74b1b7f-71a5-4011-9441-d0b5e4122711",
... )
>>> bundle.has_any()
True
>>> IdentifierBundle().has_any()
False
has_any
¶
Check if at least one identifier is set.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if any identifier field is not None. |
Source code in src/music_attribution/schemas/normalized.py
SourceMetadata
¶
Bases: BaseModel
Typed source-specific metadata attached to a NormalizedRecord.
Contains supplementary information that varies by source but follows a common schema. Fields that do not apply to a particular source are left as their defaults (None or empty list).
| ATTRIBUTE | DESCRIPTION |
|---|---|
roles |
Credit roles reported by the source (free-text, not yet mapped
to
TYPE:
|
release_date |
Release date as reported by the source. String format varies
(ISO 8601 preferred, but partial dates like
TYPE:
|
release_country |
ISO 3166-1 alpha-2 country code for the release territory
(e.g.,
TYPE:
|
genres |
Genre tags reported by the source. Free-text, not standardised across sources.
TYPE:
|
duration_ms |
Track duration in milliseconds. May differ between sources due to different mastering or silence handling.
TYPE:
|
track_number |
Track position within the release medium.
TYPE:
|
medium_format |
Physical or digital medium format (e.g.,
TYPE:
|
language |
ISO 639-1 language code for lyrics/vocals (e.g.,
TYPE:
|
extras |
Catch-all for source-specific fields that do not map to the common schema. Keys and values are both strings.
TYPE:
|
Examples:
>>> meta = SourceMetadata(
... roles=["performer", "songwriter"],
... release_date="2005-10-17",
... release_country="GB",
... genres=["electronic", "art pop"],
... duration_ms=265000,
... )
Relationship
¶
Bases: BaseModel
Link between entities within a single data source.
Represents a directed edge from the parent NormalizedRecord to
another entity identified by its source-specific ID. These
relationships are source-local; cross-source relationship resolution
happens in the Entity Resolution pipeline, producing
ResolvedRelationship objects.
| ATTRIBUTE | DESCRIPTION |
|---|---|
relationship_type |
The type of relationship (e.g., PERFORMED, WROTE, PRODUCED).
TYPE:
|
target_source |
The data source of the target entity.
TYPE:
|
target_source_id |
Source-specific identifier of the target entity (e.g., a MusicBrainz MBID or Discogs numeric ID as string).
TYPE:
|
target_entity_type |
The entity type of the target (e.g., ARTIST, RECORDING).
TYPE:
|
attributes |
Additional relationship attributes (e.g.,
TYPE:
|
Examples:
>>> rel = Relationship(
... relationship_type=RelationshipTypeEnum.PERFORMED,
... target_source=SourceEnum.MUSICBRAINZ,
... target_source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
... target_entity_type=EntityTypeEnum.ARTIST,
... attributes={"instrument": "vocals"},
... )
NormalizedRecord
¶
Bases: BaseModel
ETL output: normalized music metadata from a single external source.
The NormalizedRecord is the first boundary object (BO-1) in the
five-pipeline architecture. All ETL extractors produce
NormalizedRecord instances regardless of their source format.
Multiple records for the same real-world entity (from different
sources) are merged by the Entity Resolution pipeline into a
ResolvedEntity.
| ATTRIBUTE | DESCRIPTION |
|---|---|
schema_version |
Semantic version of the NormalizedRecord schema. Defaults to
TYPE:
|
record_id |
Unique identifier for this record. Auto-generated UUIDv4.
TYPE:
|
source |
Which data source provided this record.
TYPE:
|
source_id |
Source-specific identifier (e.g., MusicBrainz MBID, Discogs release ID as string).
TYPE:
|
entity_type |
The type of music entity this record represents.
TYPE:
|
canonical_name |
Primary name of the entity as reported by the source. Must be non-empty after whitespace stripping.
TYPE:
|
alternative_names |
Alternative names, aliases, or transliterations. Used during fuzzy entity resolution.
TYPE:
|
identifiers |
Standard music industry identifiers (ISRC, ISWC, ISNI, etc.). Machine sources (MusicBrainz, Discogs, AcoustID) must provide at least one identifier.
TYPE:
|
metadata |
Source-specific metadata (genres, release date, duration, etc.).
TYPE:
|
relationships |
Links to other entities within the same source.
TYPE:
|
fetch_timestamp |
UTC timestamp when this record was fetched from the source. Must be timezone-aware and not more than 60 seconds in the future (to catch clock skew).
TYPE:
|
source_confidence |
Source-reported confidence in the data, range [0.0, 1.0]. 0.0 = no confidence data available; 1.0 = verified by authority.
TYPE:
|
raw_payload |
Original API response preserved for debugging and re-processing. May be None if raw data is not retained.
TYPE:
|
Examples:
>>> from datetime import datetime, UTC
>>> record = NormalizedRecord(
... source=SourceEnum.MUSICBRAINZ,
... source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
... entity_type=EntityTypeEnum.RECORDING,
... canonical_name="Hide and Seek",
... identifiers=IdentifierBundle(isrc="GBAYE0601498"),
... fetch_timestamp=datetime.now(UTC),
... source_confidence=0.87,
... )
See Also
ResolvedEntity : The next boundary object produced by Entity Resolution.
validate_canonical_name
classmethod
¶
Canonical name must be non-empty after stripping.
Source code in src/music_attribution/schemas/normalized.py
validate_fetch_timestamp
classmethod
¶
Fetch timestamp must be timezone-aware and not far in the future.
Source code in src/music_attribution/schemas/normalized.py
validate_identifiers_for_machine_sources
¶
validate_identifiers_for_machine_sources() -> (
NormalizedRecord
)
Machine sources require at least one identifier.
Source code in src/music_attribution/schemas/normalized.py
Resolved Entity (Resolution Output)¶
resolved
¶
ResolvedEntity boundary object schema (BO-2).
Output of the Entity Resolution pipeline. A unified entity that merges
multiple NormalizedRecord instances from different sources into a
single canonical entity with resolution confidence and assurance level.
The ResolvedEntity is the second boundary object in the five-pipeline
architecture. It carries forward the provenance of every source that
contributed to it, enabling downstream attribution scoring to weight
sources by reliability.
See Also
music_attribution.schemas.normalized : The preceding boundary object. music_attribution.schemas.attribution : The next boundary object. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5.
SourceReference
¶
Bases: BaseModel
Reference to a contributing NormalizedRecord.
Links a ResolvedEntity back to the specific NormalizedRecord
that contributed to it, preserving full provenance. The agreement
score measures how well this source's data aligns with the resolved
consensus.
| ATTRIBUTE | DESCRIPTION |
|---|---|
record_id |
UUID of the contributing
TYPE:
|
source |
Which data source provided the record.
TYPE:
|
source_id |
Source-specific identifier of the record.
TYPE:
|
agreement_score |
How well this source agrees with the resolved consensus, range [0.0, 1.0]. 1.0 = perfect agreement on all fields; 0.0 = complete disagreement.
TYPE:
|
Examples:
>>> ref = SourceReference(
... record_id=uuid.uuid4(),
... source=SourceEnum.MUSICBRAINZ,
... source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
... agreement_score=0.95,
... )
ResolutionDetails
¶
Bases: BaseModel
Per-method confidence breakdown for entity resolution.
Records the confidence contribution from each resolution method that
was attempted. Only populated fields were actually used; None
means that method was not applied. This enables post-hoc analysis
of which methods are most effective for different entity types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
string_similarity |
Confidence from fuzzy string matching (Jaro-Winkler, Levenshtein), range [0.0, 1.0]. None if not attempted.
TYPE:
|
embedding_similarity |
Confidence from semantic embedding similarity (cosine distance), range [0.0, 1.0]. None if not attempted.
TYPE:
|
graph_path_confidence |
Confidence from graph-based resolution (path length, co-occurrence patterns), range [0.0, 1.0]. None if not attempted.
TYPE:
|
llm_confidence |
Confidence from LLM-assisted resolution, range [0.0, 1.0]. None if not attempted.
TYPE:
|
matched_identifiers |
Names of identifiers that matched exactly (e.g.,
TYPE:
|
Examples:
>>> details = ResolutionDetails(
... string_similarity=0.92,
... matched_identifiers=["isrc"],
... )
ResolvedRelationship
¶
Bases: BaseModel
Resolved cross-entity relationship link.
Unlike source-local Relationship objects in NormalizedRecord,
a ResolvedRelationship links two ResolvedEntity instances
and is backed by one or more corroborating data sources.
| ATTRIBUTE | DESCRIPTION |
|---|---|
target_entity_id |
UUID of the target
TYPE:
|
relationship_type |
The type of relationship (e.g., PERFORMED, WROTE).
TYPE:
|
confidence |
Confidence in this relationship, range [0.0, 1.0]. Higher when multiple sources corroborate the link.
TYPE:
|
supporting_sources |
Data sources that corroborate this relationship. More sources generally means higher confidence.
TYPE:
|
Examples:
>>> rel = ResolvedRelationship(
... target_entity_id=uuid.uuid4(),
... relationship_type=RelationshipTypeEnum.PERFORMED,
... confidence=0.92,
... supporting_sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
... )
Conflict
¶
Bases: BaseModel
Unresolved disagreement between data sources.
When entity resolution encounters contradictory information from
different sources for the same field, it records a Conflict
rather than silently choosing one value. Conflicts with severity
HIGH or CRITICAL trigger needs_review = True on the parent
ResolvedEntity.
| ATTRIBUTE | DESCRIPTION |
|---|---|
field |
Name of the field in conflict (e.g.,
TYPE:
|
values |
Mapping of source name to its reported value. Keys are source
identifiers (e.g.,
TYPE:
|
severity |
How severe the disagreement is, from LOW (auto-resolvable) to CRITICAL (blocks attribution).
TYPE:
|
Examples:
>>> conflict = Conflict(
... field="canonical_name",
... values={"MUSICBRAINZ": "Imogen Heap", "DISCOGS": "I. Heap"},
... severity=ConflictSeverityEnum.LOW,
... )
ResolvedEntity
¶
Bases: BaseModel
Unified entity resolved from multiple data sources.
The ResolvedEntity is the second boundary object (BO-2) in the
five-pipeline architecture. It is produced by the Entity Resolution
pipeline and consumed by the Attribution Engine. Each instance
represents a single real-world music entity (artist, recording,
work, etc.) with a canonical identity established by merging one
or more NormalizedRecord instances.
| ATTRIBUTE | DESCRIPTION |
|---|---|
schema_version |
Semantic version of the ResolvedEntity schema. Defaults to
TYPE:
|
entity_id |
Unique identifier for this resolved entity. Auto-generated UUIDv4.
TYPE:
|
entity_type |
The type of music entity (RECORDING, WORK, ARTIST, etc.).
TYPE:
|
canonical_name |
Best-consensus name for the entity, chosen from contributing sources by the resolution algorithm.
TYPE:
|
alternative_names |
All other names/aliases from contributing sources, used for future matching and display.
TYPE:
|
identifiers |
Merged identifier bundle combining identifiers from all contributing sources.
TYPE:
|
source_records |
References to all
TYPE:
|
resolution_method |
Primary method used to resolve/merge the source records.
TYPE:
|
resolution_confidence |
Overall confidence in the resolution, range [0.0, 1.0]. This is the resolution pipeline's assessment of how likely it is that all merged records truly refer to the same entity.
TYPE:
|
resolution_details |
Per-method confidence breakdown showing which methods contributed and their individual confidence scores.
TYPE:
|
assurance_level |
A0-A3 assurance level determined by the number and quality of corroborating sources. See Teikari (2026), section 6.
TYPE:
|
relationships |
Cross-entity links resolved from source-local relationships.
TYPE:
|
conflicts |
Unresolved disagreements between sources. May trigger human review if severity is HIGH or CRITICAL.
TYPE:
|
needs_review |
Flag indicating this entity requires human review before attribution scoring proceeds.
TYPE:
|
review_reason |
Human-readable explanation of why review is needed. Required
when
TYPE:
|
merged_from |
If this entity was formed by merging previously separate
TYPE:
|
resolved_at |
UTC timestamp when resolution was performed. Must be timezone-aware.
TYPE:
|
Examples:
>>> from datetime import datetime, UTC
>>> entity = ResolvedEntity(
... entity_type=EntityTypeEnum.RECORDING,
... canonical_name="Hide and Seek",
... source_records=[
... SourceReference(
... record_id=uuid.uuid4(),
... source=SourceEnum.MUSICBRAINZ,
... source_id="abc-123",
... agreement_score=0.95,
... ),
... ],
... resolution_method=ResolutionMethodEnum.EXACT_ID,
... resolution_confidence=0.98,
... assurance_level=AssuranceLevelEnum.LEVEL_2,
... resolved_at=datetime.now(UTC),
... )
See Also
NormalizedRecord : The preceding boundary object from ETL. AttributionRecord : The next boundary object from Attribution Engine.
validate_resolved_at
classmethod
¶
Resolved timestamp must be timezone-aware.
Source code in src/music_attribution/schemas/resolved.py
validate_review_fields
¶
validate_review_fields() -> ResolvedEntity
If needs_review is True, review_reason must be provided.
Source code in src/music_attribution/schemas/resolved.py
Attribution Record (Engine Output)¶
attribution
¶
AttributionRecord boundary object schema (BO-3).
Output of the Attribution Engine pipeline. A complete attribution record for a musical work/recording with calibrated confidence scores, conformal prediction sets, and a full provenance chain.
The AttributionRecord is the third boundary object in the five-pipeline
architecture and the primary output consumed by the API/MCP Server and
Chat Interface pipelines. It carries the complete audit trail of how an
attribution was constructed, enabling transparent confidence communication
to end users.
See Also
music_attribution.schemas.resolved : The preceding boundary object. music_attribution.schemas.feedback : Reverse-flow feedback from users. music_attribution.schemas.uncertainty : Uncertainty decomposition models. Teikari, P. (2026). Music Attribution with Transparent Confidence, sections 5-6.
EventDetails
module-attribute
¶
EventDetails = Annotated[
FetchEventDetails
| ResolveEventDetails
| ScoreEventDetails
| ReviewEventDetails
| UpdateEventDetails
| FeedbackEventDetails,
Field(discriminator="type"),
]
Discriminated union of provenance event detail types.
Uses Pydantic's discriminator field (type) to deserialize into the
correct detail class. Each variant corresponds to a
ProvenanceEventTypeEnum value.
Credit
¶
Bases: BaseModel
Attribution credit for a single entity-role pair.
Represents one line item in the attribution: a specific entity (artist, producer, etc.) credited in a specific role on a musical work or recording. Each credit carries its own confidence score and assurance level, independent of the overall record.
| ATTRIBUTE | DESCRIPTION |
|---|---|
entity_id |
UUID of the
TYPE:
|
entity_name |
Display name of the credited entity. Defaults to empty string; populated for API/UI convenience.
TYPE:
|
role |
The role in which the entity is credited (e.g., PERFORMER, SONGWRITER, PRODUCER).
TYPE:
|
role_detail |
Additional role detail not captured by the enum (e.g.,
TYPE:
|
confidence |
Confidence in this specific credit assignment, range [0.0, 1.0]. May differ from the overall record confidence.
TYPE:
|
sources |
Data sources that corroborate this credit. More sources generally yield higher confidence.
TYPE:
|
assurance_level |
A0-A3 assurance level for this specific credit.
TYPE:
|
Examples:
>>> credit = Credit(
... entity_id=uuid.uuid4(),
... entity_name="Imogen Heap",
... role=CreditRoleEnum.PERFORMER,
... role_detail="lead vocals, keyboards",
... confidence=0.95,
... sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
... assurance_level=AssuranceLevelEnum.LEVEL_2,
... )
ConformalSet
¶
Bases: BaseModel
Conformal prediction set at a specified coverage level.
Instead of a single point prediction for each credit role, conformal prediction produces a set of plausible roles that contains the true role with a guaranteed probability (the coverage level). Smaller sets indicate higher confidence. See Teikari (2026), section 5.
| ATTRIBUTE | DESCRIPTION |
|---|---|
coverage_level |
Target coverage probability, range (0.0, 1.0) exclusive. Typical values: 0.90 (90% coverage) or 0.95 (95% coverage).
TYPE:
|
prediction_sets |
Mapping of entity ID (as string) to the set of plausible roles for that entity. Smaller sets = more certain attribution.
TYPE:
|
set_sizes |
Mapping of entity ID to the cardinality of their prediction set. A set size of 1 means the role is unambiguous at the given coverage level.
TYPE:
|
marginal_coverage |
Observed marginal coverage on the calibration set, range
[0.0, 1.0]. Should be close to
TYPE:
|
calibration_error |
Absolute difference between
TYPE:
|
calibration_method |
Name of the calibration method used (e.g.,
TYPE:
|
calibration_set_size |
Number of examples in the calibration set. Larger sets give tighter coverage guarantees. Non-negative.
TYPE:
|
Examples:
>>> conformal = ConformalSet(
... coverage_level=0.90,
... prediction_sets={"entity-uuid": [CreditRoleEnum.PERFORMER]},
... set_sizes={"entity-uuid": 1},
... marginal_coverage=0.91,
... calibration_error=0.01,
... calibration_method="split_conformal",
... calibration_set_size=500,
... )
FetchEventDetails
¶
Bases: BaseModel
Details for FETCH provenance events.
Records metadata about a data fetch operation from an external source as part of the provenance chain.
| ATTRIBUTE | DESCRIPTION |
|---|---|
type |
Discriminator field for the
TYPE:
|
source |
The data source that was queried.
TYPE:
|
source_id |
Source-specific query identifier or endpoint.
TYPE:
|
records_fetched |
Number of records returned by the fetch. Non-negative.
TYPE:
|
rate_limited |
Whether the fetch was rate-limited by the source API.
TYPE:
|
ResolveEventDetails
¶
Bases: BaseModel
Details for RESOLVE provenance events.
Records metadata about an entity resolution step, including the method used and the reduction ratio (input records to output entities).
| ATTRIBUTE | DESCRIPTION |
|---|---|
type |
Discriminator field. Always
TYPE:
|
method |
Name of the resolution method or algorithm used.
TYPE:
|
records_input |
Number of
TYPE:
|
entities_output |
Number of
TYPE:
|
confidence_range |
(min, max) confidence range across all output entities.
Defaults to
TYPE:
|
ScoreEventDetails
¶
Bases: BaseModel
Details for SCORE provenance events.
Records a confidence scoring or recalibration step, showing how the confidence value changed.
| ATTRIBUTE | DESCRIPTION |
|---|---|
type |
Discriminator field. Always
TYPE:
|
previous_confidence |
Confidence before this scoring step, range [0.0, 1.0]. None for the initial scoring event.
TYPE:
|
new_confidence |
Confidence after this scoring step, range [0.0, 1.0].
TYPE:
|
scoring_method |
Name of the scoring/calibration method applied (e.g.,
TYPE:
|
ReviewEventDetails
¶
Bases: BaseModel
Details for REVIEW provenance events.
Records that a human reviewer examined the attribution and
optionally applied corrections from a FeedbackCard.
| ATTRIBUTE | DESCRIPTION |
|---|---|
type |
Discriminator field. Always
TYPE:
|
reviewer_id |
Identifier of the reviewer who performed the review.
TYPE:
|
feedback_card_id |
UUID of the
TYPE:
|
corrections_applied |
Number of field corrections accepted from the feedback card. Non-negative. Zero means the reviewer confirmed the record without changes.
TYPE:
|
UpdateEventDetails
¶
Bases: BaseModel
Details for UPDATE provenance events.
Records a version bump on the attribution record, including which fields changed and what triggered the update.
| ATTRIBUTE | DESCRIPTION |
|---|---|
type |
Discriminator field. Always
TYPE:
|
previous_version |
Version number before this update. Minimum 1.
TYPE:
|
new_version |
Version number after this update. Minimum 1. Should be
TYPE:
|
fields_changed |
Names of fields that were modified in this update.
TYPE:
|
trigger |
What triggered the update (e.g.,
TYPE:
|
FeedbackEventDetails
¶
Bases: BaseModel
Details for FEEDBACK provenance events.
Records that a FeedbackCard was processed by the Attribution
Engine and its corrections were either accepted or rejected.
| ATTRIBUTE | DESCRIPTION |
|---|---|
type |
Discriminator field. Always
TYPE:
|
feedback_card_id |
UUID of the
TYPE:
|
overall_assessment |
The reviewer's overall assessment score from the feedback card, range [0.0, 1.0].
TYPE:
|
corrections_count |
Number of corrections in the feedback card. Non-negative.
TYPE:
|
accepted |
Whether the feedback was accepted and applied to the attribution record.
TYPE:
|
ProvenanceEvent
¶
Bases: BaseModel
Single event in the attribution provenance audit trail.
Each ProvenanceEvent records one discrete action that contributed
to or modified an attribution record. The chain of events forms an
immutable audit trail enabling full transparency of how an
attribution was constructed and refined.
| ATTRIBUTE | DESCRIPTION |
|---|---|
event_type |
High-level event type (FETCH, RESOLVE, SCORE, REVIEW, UPDATE, FEEDBACK).
TYPE:
|
timestamp |
UTC timestamp when this event occurred. Must be timezone-aware.
TYPE:
|
agent |
Identifier of the software agent or human that performed this
action (e.g.,
TYPE:
|
details |
Typed event details (discriminated union). The concrete type
matches
TYPE:
|
feedback_card_id |
UUID of the associated
TYPE:
|
step_uncertainty |
Uncertainty decomposition for this specific pipeline step, if available.
TYPE:
|
citation_index |
1-based citation index for referencing this event in chat responses. None if not cited. Minimum 1 when set.
TYPE:
|
validate_timestamp
classmethod
¶
Timestamp must be timezone-aware.
Source code in src/music_attribution/schemas/attribution.py
AttributionRecord
¶
Bases: BaseModel
Complete attribution record for a musical work or recording.
The AttributionRecord is the third boundary object (BO-3) in the
five-pipeline architecture and the primary deliverable of the
Attribution Engine. It is consumed by the API/MCP Server (for
permission queries) and the Chat Interface (for user-facing
attribution display).
Each record contains: (1) a list of credits with per-credit confidence, (2) conformal prediction sets providing coverage guarantees, (3) an immutable provenance chain, and (4) an optional uncertainty summary. See Teikari (2026), sections 5-6.
| ATTRIBUTE | DESCRIPTION |
|---|---|
schema_version |
Semantic version of the AttributionRecord schema. Defaults to
TYPE:
|
attribution_id |
Unique identifier for this attribution record. Auto-generated UUIDv4.
TYPE:
|
work_entity_id |
UUID of the
TYPE:
|
work_title |
Display title of the work. Populated for API/UI convenience.
TYPE:
|
artist_name |
Display name of the primary artist. Populated for API/UI convenience.
TYPE:
|
credits |
Attribution credits. Must contain at least one credit. Each credit links an entity to a role with confidence scoring.
TYPE:
|
assurance_level |
Overall A0-A3 assurance level for this attribution record, determined by the weakest link in the evidence chain.
TYPE:
|
confidence_score |
Overall calibrated confidence score, range [0.0, 1.0]. Aggregated from per-credit confidences and source agreement.
TYPE:
|
conformal_set |
Conformal prediction set providing coverage guarantees on role assignments.
TYPE:
|
source_agreement |
Degree of agreement across data sources, range [0.0, 1.0]. 1.0 = all sources agree on all credits; 0.0 = total disagreement.
TYPE:
|
provenance_chain |
Ordered list of provenance events forming the audit trail. Events are appended chronologically.
TYPE:
|
uncertainty_summary |
Aggregated uncertainty decomposition across all pipeline steps. None if uncertainty tracking is not enabled.
TYPE:
|
needs_review |
Flag indicating this record requires human review before being surfaced to end users.
TYPE:
|
review_priority |
Priority score for the review queue, range [0.0, 1.0]. Higher values = more urgent review needed.
TYPE:
|
created_at |
UTC timestamp when this record was first created. Must be timezone-aware.
TYPE:
|
updated_at |
UTC timestamp of the most recent update. Must be
timezone-aware. Must be >=
TYPE:
|
version |
Monotonically increasing version number. Minimum 1. Bumped on every update.
TYPE:
|
Examples:
>>> from datetime import datetime, UTC
>>> record = AttributionRecord(
... work_entity_id=uuid.uuid4(),
... work_title="Hide and Seek",
... artist_name="Imogen Heap",
... credits=[
... Credit(
... entity_id=uuid.uuid4(),
... entity_name="Imogen Heap",
... role=CreditRoleEnum.PERFORMER,
... confidence=0.95,
... sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
... assurance_level=AssuranceLevelEnum.LEVEL_2,
... ),
... ],
... assurance_level=AssuranceLevelEnum.LEVEL_2,
... confidence_score=0.92,
... conformal_set=ConformalSet(
... coverage_level=0.90,
... marginal_coverage=0.91,
... calibration_error=0.01,
... calibration_method="split_conformal",
... calibration_set_size=500,
... ),
... source_agreement=0.88,
... review_priority=0.1,
... created_at=datetime.now(UTC),
... updated_at=datetime.now(UTC),
... version=1,
... )
See Also
ResolvedEntity : The preceding boundary object from Entity Resolution. FeedbackCard : Reverse-flow feedback for calibration updates.
validate_timestamps
classmethod
¶
Timestamps must be timezone-aware.
Source code in src/music_attribution/schemas/attribution.py
validate_updated_after_created
¶
validate_updated_after_created() -> AttributionRecord
updated_at must be >= created_at.
Source code in src/music_attribution/schemas/attribution.py
Permissions¶
permissions
¶
PermissionBundle boundary object schema (BO-5).
Machine-readable permission specification for MCP consent queries. Implements the Permission Patchbay from Teikari (2026), section 7.
The PermissionBundle enables AI platforms and other consumers to
programmatically query whether specific uses of a musical work are
permitted, under what conditions, and who authorised the permission.
This is the MCP-native alternative to robots.txt for audio content.
See Also
music_attribution.schemas.enums : Permission-related enums. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 7 (Permission Patchbay).
PermissionCondition
¶
Bases: BaseModel
Optional condition attached to a permission entry.
Conditions qualify a permission with additional constraints. For example, a permission might only apply in certain territories or time periods.
| ATTRIBUTE | DESCRIPTION |
|---|---|
condition_type |
Type of condition (e.g.,
TYPE:
|
value |
Condition value as a string. Format depends on
TYPE:
|
Examples:
PermissionEntry
¶
Bases: BaseModel
A single permission with optional conditions and requirements.
Each entry specifies a permission type (what use), a value (allow, deny, ask, or conditional), and optional requirements. Conditional values (ALLOW_WITH_ATTRIBUTION, ALLOW_WITH_ROYALTY) require their respective fields to be populated.
| ATTRIBUTE | DESCRIPTION |
|---|---|
permission_type |
What kind of use this permission governs (e.g., AI_TRAINING, STREAM, REMIX).
TYPE:
|
value |
The permission decision (ALLOW, DENY, ASK, ALLOW_WITH_ATTRIBUTION, ALLOW_WITH_ROYALTY).
TYPE:
|
conditions |
Additional conditions qualifying this permission.
TYPE:
|
royalty_rate |
Royalty rate as a decimal (e.g.,
TYPE:
|
attribution_requirement |
Required attribution text or format. Required when
TYPE:
|
territory |
ISO 3166-1 alpha-2 country codes where this permission applies. None means worldwide.
TYPE:
|
Examples:
>>> entry = PermissionEntry(
... permission_type=PermissionTypeEnum.AI_TRAINING,
... value=PermissionValueEnum.ALLOW_WITH_ROYALTY,
... royalty_rate=Decimal("0.015"),
... territory=["US", "GB"],
... )
validate_conditional_fields
¶
validate_conditional_fields() -> PermissionEntry
Validate fields required by specific permission values.
Source code in src/music_attribution/schemas/permissions.py
DelegationEntry
¶
Bases: BaseModel
An entry in the permission delegation chain.
Models the chain of authority from the rights owner through intermediaries (manager, label, distributor). Each entry specifies what authority the entity has over the permissions.
| ATTRIBUTE | DESCRIPTION |
|---|---|
entity_id |
UUID of the entity in the delegation chain (a
TYPE:
|
role |
The entity's role in the delegation chain (OWNER, MANAGER, LABEL, DISTRIBUTOR).
TYPE:
|
can_modify |
Whether this entity can modify permission entries. Typically True for OWNER and MANAGER, False for DISTRIBUTOR.
TYPE:
|
can_delegate |
Whether this entity can further delegate authority to another entity.
TYPE:
|
Examples:
>>> entry = DelegationEntry(
... entity_id=uuid.uuid4(),
... role=DelegationRoleEnum.OWNER,
... can_modify=True,
... can_delegate=True,
... )
PermissionBundle
¶
Bases: BaseModel
Machine-readable permission specification (BO-5).
The PermissionBundle is the boundary object used by the API/MCP
Server pipeline to answer permission queries from AI platforms and
other consumers. It implements the Permission Patchbay design from
Teikari (2026), section 7.
Each bundle specifies permissions at a given scope (catalog, release, recording, or work) with an effective date range, a delegation chain showing who authorised the permissions, and a default permission for unlisted permission types.
| ATTRIBUTE | DESCRIPTION |
|---|---|
schema_version |
Semantic version of the PermissionBundle schema. Defaults to
TYPE:
|
permission_id |
Unique identifier for this permission bundle. Auto-generated UUIDv4.
TYPE:
|
entity_id |
UUID of the rights holder entity (artist, label, publisher).
TYPE:
|
scope |
Granularity of this permission bundle (CATALOG, RELEASE, RECORDING, WORK).
TYPE:
|
scope_entity_id |
UUID of the specific entity this permission applies to. Must be None when scope is CATALOG; must be non-None otherwise.
TYPE:
|
permissions |
List of individual permission entries. Must contain at least one.
TYPE:
|
effective_from |
UTC timestamp from which this bundle is effective. Must be timezone-aware.
TYPE:
|
effective_until |
UTC timestamp until which this bundle is effective. None means
no expiry. Must be after
TYPE:
|
delegation_chain |
Chain of authority from the rights owner through intermediaries.
TYPE:
|
default_permission |
Default response for permission types not explicitly listed in
TYPE:
|
created_by |
UUID of the entity that created this permission bundle.
TYPE:
|
updated_at |
UTC timestamp of the most recent update. Must be timezone-aware.
TYPE:
|
version |
Monotonically increasing version number. Minimum 1.
TYPE:
|
Examples:
>>> from datetime import datetime, UTC
>>> from decimal import Decimal
>>> bundle = PermissionBundle(
... entity_id=uuid.uuid4(),
... scope=PermissionScopeEnum.CATALOG,
... permissions=[
... PermissionEntry(
... permission_type=PermissionTypeEnum.AI_TRAINING,
... value=PermissionValueEnum.DENY,
... ),
... ],
... effective_from=datetime.now(UTC),
... default_permission=PermissionValueEnum.ASK,
... created_by=uuid.uuid4(),
... updated_at=datetime.now(UTC),
... version=1,
... )
See Also
PermissionEntry : Individual permission with conditions. DelegationEntry : Authority chain for permission provenance.
validate_timestamps
classmethod
¶
Timestamps must be timezone-aware.
Source code in src/music_attribution/schemas/permissions.py
validate_effective_until
classmethod
¶
effective_until must be timezone-aware when set.
Source code in src/music_attribution/schemas/permissions.py
validate_scope_consistency
¶
validate_scope_consistency() -> PermissionBundle
scope_entity_id must be None for CATALOG, required for others.
Source code in src/music_attribution/schemas/permissions.py
validate_effective_range
¶
validate_effective_range() -> PermissionBundle
effective_from must be before effective_until.
Source code in src/music_attribution/schemas/permissions.py
Feedback¶
feedback
¶
FeedbackCard boundary object schema (BO-4).
Structured feedback from domain experts (artists, managers, musicologists, producers). Flows from the Chat Interface back into the Attribution Engine for calibration updates. Ref: Zhou et al., 2023 -- FeedbackCards.
The FeedbackCard is the primary reverse-flow boundary object in the
five-pipeline architecture, enabling human-in-the-loop calibration. When
a domain expert reviews an attribution and provides corrections, these
are captured in a FeedbackCard that feeds back into the Attribution
Engine for confidence recalibration.
See Also
music_attribution.schemas.attribution : The AttributionRecord being reviewed. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 6.
Correction
¶
Bases: BaseModel
A specific correction to an attribution field.
Represents a single field-level correction proposed by a reviewer. Each correction includes the current (incorrect) value, the proposed corrected value, and the reviewer's confidence in their correction.
| ATTRIBUTE | DESCRIPTION |
|---|---|
field |
Name of the field being corrected (e.g.,
TYPE:
|
current_value |
The current (incorrect) value of the field as a string.
TYPE:
|
corrected_value |
The proposed correct value as a string.
TYPE:
|
entity_id |
UUID of the specific entity this correction applies to, if the correction is entity-specific (e.g., correcting a credit role). None for record-level corrections.
TYPE:
|
confidence_in_correction |
The reviewer's confidence that their correction is accurate, range [0.0, 1.0]. Higher values from authoritative reviewers (e.g., the artist) carry more weight during recalibration.
TYPE:
|
evidence |
Free-text description of the evidence supporting this
correction (e.g.,
TYPE:
|
Examples:
>>> correction = Correction(
... field="role",
... current_value="PERFORMER",
... corrected_value="PRODUCER",
... entity_id=uuid.uuid4(),
... confidence_in_correction=0.95,
... evidence="Confirmed in studio session notes",
... )
FeedbackCard
¶
Bases: BaseModel
Structured feedback from a domain expert (BO-4).
The FeedbackCard is the reverse-flow boundary object in the
five-pipeline architecture, flowing from the Chat Interface back
into the Attribution Engine. It captures structured corrections
and an overall assessment from a domain expert who reviewed an
AttributionRecord.
A valid FeedbackCard must contain either corrections or free-text
(or both). The center_bias_flag is automatically set when the
overall assessment falls in the [0.45, 0.55] range, indicating
potential anchoring bias toward the midpoint.
| ATTRIBUTE | DESCRIPTION |
|---|---|
schema_version |
Semantic version of the FeedbackCard schema. Defaults to
TYPE:
|
feedback_id |
Unique identifier for this feedback card. Auto-generated UUIDv4.
TYPE:
|
attribution_id |
UUID of the
TYPE:
|
reviewer_id |
Identifier of the reviewer (may be an email, username, or external ID).
TYPE:
|
reviewer_role |
Domain expertise of the reviewer (ARTIST, MANAGER, MUSICOLOGIST, PRODUCER, FAN).
TYPE:
|
attribution_version |
Version of the
TYPE:
|
corrections |
Specific field-level corrections proposed by the reviewer. May be empty if only free-text feedback is provided.
TYPE:
|
overall_assessment |
Reviewer's overall assessment of the attribution quality, range [0.0, 1.0]. 0.0 = completely wrong; 1.0 = perfect.
TYPE:
|
center_bias_flag |
Automatically set to True if
TYPE:
|
free_text |
Free-text feedback for nuances not captured by structured corrections.
TYPE:
|
evidence_type |
Type of evidence supporting the feedback (LINER_NOTES, MEMORY, DOCUMENT, SESSION_NOTES, OTHER).
TYPE:
|
submitted_at |
UTC timestamp when the feedback was submitted. Must be timezone-aware.
TYPE:
|
Examples:
>>> from datetime import datetime, UTC
>>> card = FeedbackCard(
... attribution_id=uuid.uuid4(),
... reviewer_id="imogen.heap@example.com",
... reviewer_role=ReviewerRoleEnum.ARTIST,
... attribution_version=1,
... corrections=[
... Correction(
... field="role",
... current_value="PERFORMER",
... corrected_value="SONGWRITER",
... confidence_in_correction=1.0,
... ),
... ],
... overall_assessment=0.7,
... evidence_type=EvidenceTypeEnum.MEMORY,
... submitted_at=datetime.now(UTC),
... )
See Also
AttributionRecord : The record being reviewed. Correction : Individual field-level corrections.
validate_submitted_at
classmethod
¶
submitted_at must be timezone-aware.
Source code in src/music_attribution/schemas/feedback.py
validate_content_not_empty
¶
validate_content_not_empty() -> FeedbackCard
A feedback card must have corrections or free_text.
Source code in src/music_attribution/schemas/feedback.py
validate_center_bias
¶
validate_center_bias() -> FeedbackCard
Set center_bias_flag if overall_assessment is in [CENTER_BIAS_LOW, CENTER_BIAS_HIGH].
Source code in src/music_attribution/schemas/feedback.py
Uncertainty¶
uncertainty
¶
Uncertainty-aware provenance schema models.
Provides decomposed uncertainty tracking for every step of the attribution
pipeline. These models are attached to ProvenanceEvent and
AttributionRecord objects, enabling transparent communication of why
a confidence score is what it is, not just what it is.
Academic grounding:
- UProp (Duan 2025, arXiv:2506.17419) -- intrinsic/extrinsic decomposition of uncertainty propagation across pipeline steps.
- Liu (2025, arXiv:2503.15850) -- 4-dimensional uncertainty framework (input, reasoning, parameter, prediction).
- Yanez (2025, Patterns) -- confidence-weighted source integration for multi-source fusion.
- Tian (2025, arXiv:2508.06225) -- TH-Score for overconfidence detection in LLM-based systems.
- Tripathi (2025, arXiv:2506.23464) -- H-Score and Expected Calibration Improvement (ECI) metrics.
- Zhang (2026, arXiv:2601.15778) -- trajectory-level Holistic Trajectory Calibration (HTC).
See Also
music_attribution.schemas.attribution : Uses these models in provenance. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5 (uncertainty framework).
StepUncertainty
¶
Bases: BaseModel
Per-step uncertainty decomposition (UProp, Duan 2025).
Tracks intrinsic (data noise) and extrinsic (model/pipeline) uncertainty for each processing step in the attribution pipeline, plus an optional 4-dimensional decomposition (Liu 2025). This enables fine-grained analysis of where uncertainty enters and accumulates.
The total_uncertainty must be >= intrinsic_uncertainty
(validated at runtime), since total includes both intrinsic and
extrinsic components.
| ATTRIBUTE | DESCRIPTION |
|---|---|
step_id |
Unique identifier for this pipeline step (e.g.,
TYPE:
|
step_name |
Human-readable name of the pipeline step (e.g.,
TYPE:
|
step_index |
Zero-based position of this step in the pipeline sequence.
TYPE:
|
stated_confidence |
Raw confidence before calibration, range [0.0, 1.0].
TYPE:
|
calibrated_confidence |
Confidence after post-hoc calibration, range [0.0, 1.0].
May differ significantly from
TYPE:
|
intrinsic_uncertainty |
Uncertainty from the input data itself (noise, conflicts, missing fields), range [0.0, 1.0].
TYPE:
|
extrinsic_uncertainty |
Uncertainty from the model/algorithm (embedding limitations, threshold sensitivity), range [0.0, 1.0].
TYPE:
|
total_uncertainty |
Combined uncertainty, range [0.0, 1.0]. Must be >=
TYPE:
|
input_uncertainty |
4-D decomposition: input dimension (Liu 2025), range [0.0, 1.0]. None if 4-D decomposition not computed.
TYPE:
|
reasoning_uncertainty |
4-D decomposition: reasoning dimension, range [0.0, 1.0].
TYPE:
|
parameter_uncertainty |
4-D decomposition: parameter dimension, range [0.0, 1.0].
TYPE:
|
prediction_uncertainty |
4-D decomposition: prediction dimension, range [0.0, 1.0].
TYPE:
|
uncertainty_sources |
Classification of uncertainty sources active in this step (INTRINSIC, EXTRINSIC, ALEATORIC, EPISTEMIC).
TYPE:
|
confidence_method |
Method used to produce the confidence estimate for this step.
TYPE:
|
preceding_step_ids |
IDs of pipeline steps that feed into this step. Used for UProp uncertainty propagation tracking.
TYPE:
|
Examples:
>>> step = StepUncertainty(
... step_id="etl-musicbrainz",
... step_name="MusicBrainz ETL",
... step_index=0,
... stated_confidence=0.87,
... calibrated_confidence=0.82,
... intrinsic_uncertainty=0.10,
... extrinsic_uncertainty=0.05,
... total_uncertainty=0.15,
... confidence_method=ConfidenceMethodEnum.SELF_REPORT,
... )
validate_total_ge_intrinsic
¶
validate_total_ge_intrinsic() -> StepUncertainty
total_uncertainty must be >= intrinsic_uncertainty.
Source code in src/music_attribution/schemas/uncertainty.py
SourceContribution
¶
Bases: BaseModel
Per-source confidence with calibration quality (Yanez 2025).
Tracks how much each data source contributed to the final attribution,
with calibration quality indicating how reliable that source's
confidence estimates historically are. Sources with higher
calibration_quality receive higher weights in the aggregation.
| ATTRIBUTE | DESCRIPTION |
|---|---|
source |
The data source (e.g., MUSICBRAINZ, DISCOGS, ARTIST_INPUT).
TYPE:
|
confidence |
This source's confidence in its contribution, range [0.0, 1.0].
TYPE:
|
weight |
Normalised weight of this source in the final aggregation, range [0.0, 1.0]. Weights across all sources sum to 1.0.
TYPE:
|
calibration_quality |
Historical calibration quality of this source's confidence estimates, range [0.0, 1.0]. 1.0 = perfectly calibrated (stated confidence matches empirical accuracy).
TYPE:
|
record_count |
Number of records this source contributed to the attribution. Non-negative.
TYPE:
|
is_human |
Whether this source is human-provided (e.g., ARTIST_INPUT). Human sources may receive preferential weighting for subjective fields.
TYPE:
|
Examples:
>>> contrib = SourceContribution(
... source=SourceEnum.MUSICBRAINZ,
... confidence=0.90,
... weight=0.45,
... calibration_quality=0.85,
... record_count=3,
... )
CalibrationMetadata
¶
Bases: BaseModel
Per-step calibration metrics (Tian 2025 TH-Score).
Records calibration quality for confidence scores, including expected calibration error (ECE), calibration set size, and the method used. Lower ECE indicates better calibration (stated confidence matches empirical accuracy).
| ATTRIBUTE | DESCRIPTION |
|---|---|
expected_calibration_error |
Expected Calibration Error (ECE), the average absolute difference between confidence and accuracy across bins. Non-negative. Lower is better; 0.0 = perfectly calibrated.
TYPE:
|
calibration_set_size |
Number of examples used for calibration. Larger sets give more reliable ECE estimates. Non-negative.
TYPE:
|
status |
Current calibration status (CALIBRATED, UNCALIBRATED, PENDING).
TYPE:
|
method |
Name of the calibration method used (e.g.,
TYPE:
|
Examples:
>>> cal = CalibrationMetadata(
... expected_calibration_error=0.03,
... calibration_set_size=500,
... status=CalibrationStatusEnum.CALIBRATED,
... method="platt_scaling",
... )
OverconfidenceReport
¶
Bases: BaseModel
Overconfidence detection report (Tripathi 2025 H-Score, ECI).
Detects when stated confidence exceeds actual accuracy, a common
failure mode in LLM-based systems. The overconfidence_gap is
the primary diagnostic: positive = overconfident, negative =
underconfident, zero = perfectly calibrated.
| ATTRIBUTE | DESCRIPTION |
|---|---|
stated_confidence |
The system's stated confidence, range [0.0, 1.0].
TYPE:
|
actual_accuracy |
Empirically measured accuracy on a validation set, range [0.0, 1.0].
TYPE:
|
overconfidence_gap |
TYPE:
|
th_score |
TH-Score from Tian (2025). Measures hallucination tendency. None if not computed.
TYPE:
|
h_score |
H-Score from Tripathi (2025). Measures honesty of confidence estimates. None if not computed.
TYPE:
|
eci |
Expected Calibration Improvement (ECI) from Tripathi (2025). How much calibration could be improved. None if not computed.
TYPE:
|
Examples:
>>> report = OverconfidenceReport(
... stated_confidence=0.92,
... actual_accuracy=0.85,
... overconfidence_gap=0.07,
... th_score=0.12,
... )
TrajectoryCalibration
¶
Bases: BaseModel
Trajectory-level calibration (Zhang 2026, HTC).
Tracks confidence dynamics across the full pipeline, treating the sequence of confidence scores at each step as a trajectory. The trajectory shape (increasing, decreasing, stable, volatile) is a powerful signal for calibration: volatile trajectories often indicate unreliable final confidence.
The optional htc_feature_vector is a 48-dimensional feature vector
extracted from the trajectory for use with the HTC calibration method.
| ATTRIBUTE | DESCRIPTION |
|---|---|
trajectory_id |
Unique identifier for this trajectory (typically matches the attribution record ID).
TYPE:
|
step_count |
Number of pipeline steps in the trajectory. Minimum 1.
TYPE:
|
confidence_trend |
Classified trend of confidence across steps (INCREASING, DECREASING, STABLE, VOLATILE).
TYPE:
|
initial_confidence |
Confidence at the first pipeline step, range [0.0, 1.0].
TYPE:
|
final_confidence |
Confidence at the last pipeline step, range [0.0, 1.0].
TYPE:
|
htc_feature_vector |
48-dimensional feature vector for HTC calibration (Zhang 2026). Must be exactly length 48 when provided. None if HTC is not used.
TYPE:
|
Examples:
>>> traj = TrajectoryCalibration(
... trajectory_id="attr-12345",
... step_count=4,
... confidence_trend=ConfidenceTrendEnum.INCREASING,
... initial_confidence=0.65,
... final_confidence=0.92,
... )
validate_htc_vector_length
classmethod
¶
HTC feature vector must be length 48 when provided.
Source code in src/music_attribution/schemas/uncertainty.py
UncertaintyAwareProvenance
¶
Bases: BaseModel
Top-level uncertainty summary for an AttributionRecord.
Aggregates step-level uncertainties, source contributions,
calibration metadata, overconfidence diagnostics, and trajectory
calibration into a single summary. Attached to each
AttributionRecord as uncertainty_summary.
This is the primary structure for answering "why is the confidence what it is?" -- enabling transparent uncertainty communication to end users and downstream systems.
| ATTRIBUTE | DESCRIPTION |
|---|---|
steps |
Per-step uncertainty decomposition for each pipeline step.
Ordered by
TYPE:
|
source_contributions |
Per-source confidence and weight breakdown. Shows how much each data source influenced the final score.
TYPE:
|
calibration |
Overall calibration metrics for the record's confidence score. None if calibration has not been performed.
TYPE:
|
overconfidence |
Overconfidence diagnostic report. None if not computed.
TYPE:
|
trajectory |
Trajectory-level calibration data (HTC). None if trajectory analysis was not performed.
TYPE:
|
total_uncertainty |
Aggregated total uncertainty across all steps, range [0.0, 1.0]. Defaults to 0.0.
TYPE:
|
dominant_uncertainty_source |
The primary source of uncertainty in this record (INTRINSIC, EXTRINSIC, ALEATORIC, or EPISTEMIC). None if not determined.
TYPE:
|
Examples:
>>> summary = UncertaintyAwareProvenance(
... total_uncertainty=0.18,
... dominant_uncertainty_source=UncertaintySourceEnum.EPISTEMIC,
... )
See Also
AttributionRecord : Parent record containing this summary. StepUncertainty : Per-step decomposition detail.