Schemas¶

Pydantic boundary objects that define clean data flow boundaries between pipelines.

Enums¶

enums ¶

Shared enumerations for boundary object schemas.

All enums are StrEnum for JSON-friendly serialization, ensuring that values survive round-trips through JSON/YAML without requiring custom serializers. Domain-extensible enums (EntityTypeEnum, RelationshipTypeEnum, PermissionTypeEnum) can be extended via domain overlay YAML in a future phase.

This module provides the single source of truth for all categorical values used across the five-pipeline architecture (ETL, Entity Resolution, Attribution Engine, API/MCP Server, Chat Interface). Enums are grouped by domain:

Core pipeline -- source identification, entity types, resolution
Attribution -- credit roles, assurance levels, provenance events
Permissions -- MCP Permission Patchbay types, values, scopes
Uncertainty -- uncertainty decomposition taxonomy, calibration
Commercial landscape -- training attribution, watermarking, revenue
Regulatory/compliance -- EU AI Act, ISO 42001, DSM Directive

See Also

Teikari, P. (2026). Music Attribution with Transparent Confidence. SSRN No. 6109087 -- sections 5-7 for assurance levels and permission patchbay design.

SourceEnum ¶

Bases: StrEnum

Data source identifiers for the ETL pipeline.

Each value represents an external data source from which music metadata is ingested. The source identity is preserved throughout the entire pipeline so that downstream confidence scoring can apply per-source reliability weights. See Teikari (2026), section 5.

ATTRIBUTE	DESCRIPTION
`MUSICBRAINZ`	MusicBrainz open database. Community-curated, high coverage for Western popular music. Provides MBIDs. TYPE: `str`
`DISCOGS`	Discogs marketplace/database. Strong on vinyl releases, credits, and label information. Provides Discogs numeric IDs. TYPE: `str`
`ACOUSTID`	AcoustID audio fingerprint service. Matches audio signals to MusicBrainz recordings via Chromaprint fingerprints. TYPE: `str`
`ARTIST_INPUT`	Direct input from the artist or their representative. Highest authority for creative intent, but unverifiable by third parties. TYPE: `str`
`FILE_METADATA`	Embedded file metadata (ID3 tags, Vorbis comments, etc.) extracted from the audio file itself. Quality varies widely. TYPE: `str`

Examples:

>>> SourceEnum.MUSICBRAINZ
<SourceEnum.MUSICBRAINZ: 'MUSICBRAINZ'>
>>> SourceEnum("ARTIST_INPUT")
<SourceEnum.ARTIST_INPUT: 'ARTIST_INPUT'>

EntityTypeEnum ¶

Bases: StrEnum

Music entity types within the knowledge graph.

Follows the MusicBrainz entity model with extensions for credits. Entity resolution produces a unified graph where each node is typed by one of these values.

ATTRIBUTE	DESCRIPTION
`RECORDING`	A specific audio recording (a unique performance captured in a studio or live). Identified by ISRC. TYPE: `str`
`WORK`	An abstract musical composition independent of any recording. Identified by ISWC. A single work may have many recordings. TYPE: `str`
`ARTIST`	A person or group who creates or performs music. Identified by ISNI or IPI. TYPE: `str`
`RELEASE`	A packaged product (album, single, EP) containing one or more recordings. TYPE: `str`
`LABEL`	A record label or publishing entity that owns or distributes releases. TYPE: `str`
`CREDIT`	A specific attribution credit linking an artist to a recording or work in a particular role (e.g., producer, songwriter). TYPE: `str`

Examples:

>>> EntityTypeEnum.RECORDING
<EntityTypeEnum.RECORDING: 'RECORDING'>

RelationshipTypeEnum ¶

Bases: StrEnum

Relationship types between music entities in the knowledge graph.

Edges in the entity graph are typed by these values. Each relationship connects a source entity (typically an artist) to a target entity (typically a recording or work). Relationship types map loosely to CreditRoleEnum but represent graph edges rather than attribution line items.

ATTRIBUTE	DESCRIPTION
`PERFORMED`	Artist performed on a recording (vocalist, instrumentalist). TYPE: `str`
`WROTE`	Artist wrote the underlying composition (songwriter, composer). TYPE: `str`
`PRODUCED`	Artist produced the recording (creative/technical oversight). TYPE: `str`
`ENGINEERED`	Artist served as recording engineer. TYPE: `str`
`ARRANGED`	Artist arranged the composition for a specific performance. TYPE: `str`
`MASTERED`	Artist mastered the final audio (mastering engineer). TYPE: `str`
`MIXED`	Artist mixed the recording (mixing engineer). TYPE: `str`
`FEATURED`	Artist is a featured guest on the recording. TYPE: `str`
`SAMPLED`	Recording contains a sample from the target recording. TYPE: `str`
`REMIXED`	Recording is a remix of the target recording. TYPE: `str`

ResolutionMethodEnum ¶

Bases: StrEnum

Entity resolution methods used to merge NormalizedRecords.

The Entity Resolution pipeline may use one or more of these methods to determine whether two NormalizedRecords refer to the same real-world entity. Methods are ordered roughly by computational cost and reliability. The chosen method is recorded on each ResolvedEntity for provenance tracing.

ATTRIBUTE	DESCRIPTION
`EXACT_ID`	Exact match on a standard identifier (ISRC, ISWC, ISNI, MBID). Highest confidence, lowest cost. TYPE: `str`
`FUZZY_STRING`	Fuzzy string matching on names/titles (e.g., Levenshtein, Jaro-Winkler). Handles typos and transliterations. TYPE: `str`
`EMBEDDING`	Semantic similarity via vector embeddings (e.g., sentence transformers). Handles paraphrases and abbreviations. TYPE: `str`
`GRAPH`	Graph-based resolution using relationship structure (e.g., Splink). Exploits co-occurrence patterns in the entity graph. TYPE: `str`
`LLM`	LLM-assisted resolution for ambiguous cases. Most expensive, used as a fallback when other methods are inconclusive. TYPE: `str`
`MANUAL`	Human-in-the-loop resolution by a domain expert. Triggered when automated methods produce low-confidence matches. TYPE: `str`

AssuranceLevelEnum ¶

Bases: StrEnum

Tiered provenance classification (A0--A3).

Maps to the assurance framework from Teikari (2026), section 6. Higher levels require stronger evidence chains. The assurance level determines how much trust downstream consumers can place in an attribution record.

Ordered by verification depth: LEVEL_0 < LEVEL_1 < LEVEL_2 < LEVEL_3.

ATTRIBUTE	DESCRIPTION
`LEVEL_0`	No provenance data. Self-declared or unknown origin. Corresponds to A0 in the manuscript. TYPE: `str`
`LEVEL_1`	Single source. Documented but not independently verified. Corresponds to A1. Typical for file-metadata-only records. TYPE: `str`
`LEVEL_2`	Multiple sources agree. Cross-referenced and corroborated across at least two independent data sources. Corresponds to A2. TYPE: `str`
`LEVEL_3`	Artist-verified or authority-verified. Highest assurance level. Requires explicit confirmation from the rights holder or an authoritative registry (e.g., ISNI). Corresponds to A3. TYPE: `str`

Examples:

>>> AssuranceLevelEnum.LEVEL_3
<AssuranceLevelEnum.LEVEL_3: 'LEVEL_3'>
>>> AssuranceLevelEnum("LEVEL_0") < AssuranceLevelEnum("LEVEL_3")
True

ConflictSeverityEnum ¶

Bases: StrEnum

Conflict severity levels between data sources.

When entity resolution encounters disagreements between sources for the same field (e.g., different release dates or artist names), the conflict is assigned a severity level that determines whether it can be auto-resolved or requires human review.

ATTRIBUTE	DESCRIPTION
`LOW`	Minor discrepancy, auto-resolvable. Example: trailing whitespace differences in artist names. TYPE: `str`
`MEDIUM`	Significant discrepancy requiring attention but not blocking. Example: different release dates within the same year. TYPE: `str`
`HIGH`	Major disagreement likely indicating a data quality issue. Example: different artist names for the same ISRC. TYPE: `str`
`CRITICAL`	Fundamental conflict that blocks attribution. Example: contradictory songwriter credits from authoritative sources. Always triggers `needs_review = True`. TYPE: `str`

CreditRoleEnum ¶

Bases: StrEnum

Credit roles for music attribution.

These roles appear in Credit objects within AttributionRecord and as prediction targets in conformal prediction sets. The taxonomy covers the most common roles found across MusicBrainz, Discogs, and industry metadata standards (DDEX, CWR).

ATTRIBUTE	DESCRIPTION
`PERFORMER`	Primary performer (vocalist or lead instrumentalist). TYPE: `str`
`SONGWRITER`	Songwriter (both music and lyrics). Use `COMPOSER` or `LYRICIST` for more specific roles. TYPE: `str`
`COMPOSER`	Composed the music (melody, harmony, structure). TYPE: `str`
`LYRICIST`	Wrote the lyrics/text. TYPE: `str`
`PRODUCER`	Music producer (creative and/or technical oversight of the recording process). TYPE: `str`
`ENGINEER`	Recording/tracking engineer. TYPE: `str`
`MIXING_ENGINEER`	Mixing engineer (balance, EQ, effects in post-production). TYPE: `str`
`MASTERING_ENGINEER`	Mastering engineer (final audio processing for distribution). TYPE: `str`
`ARRANGER`	Arranged the composition for a specific performance context. TYPE: `str`
`SESSION_MUSICIAN`	Session musician (hired instrumentalist, not a band member). TYPE: `str`
`FEATURED_ARTIST`	Featured guest artist on the recording. TYPE: `str`
`CONDUCTOR`	Orchestra or ensemble conductor. TYPE: `str`
`DJ`	DJ (for electronic music, turntablism, or mix compilations). TYPE: `str`
`REMIXER`	Created a remix of the original recording. TYPE: `str`

ProvenanceEventTypeEnum ¶

Bases: StrEnum

Provenance event types for the attribution audit trail.

Every ProvenanceEvent in an AttributionRecord is typed by one of these values. Together they form an immutable audit chain showing how an attribution was constructed and refined over time.

ATTRIBUTE	DESCRIPTION
`FETCH`	Data fetched from an external source (ETL pipeline). Records which source was queried and how many records were returned. TYPE: `str`
`RESOLVE`	Entity resolution step. Records the method used and the input/output record counts. TYPE: `str`
`SCORE`	Confidence scoring/calibration step. Records the previous and new confidence values and the scoring method applied. TYPE: `str`
`REVIEW`	Human review event. Records who reviewed, which feedback card was applied, and how many corrections were made. TYPE: `str`
`UPDATE`	Record update event. Records version bump, fields changed, and what triggered the update. TYPE: `str`
`FEEDBACK`	Feedback integration event. Records that a `FeedbackCard` was processed and whether its corrections were accepted. TYPE: `str`

ReviewerRoleEnum ¶

Bases: StrEnum

Feedback reviewer roles for the FeedbackCard system.

Identifies the domain expertise of the person providing feedback. The reviewer role affects how feedback is weighted during calibration updates -- artist-provided corrections carry higher authority than fan suggestions.

ATTRIBUTE	DESCRIPTION
`ARTIST`	The artist themselves (or a confirmed representative). Highest authority for creative intent. TYPE: `str`
`MANAGER`	Artist manager or business representative. Authority for contractual and commercial metadata. TYPE: `str`
`MUSICOLOGIST`	Academic musicologist or music information retrieval expert. Authority for compositional analysis and historical context. TYPE: `str`
`PRODUCER`	Music producer who worked on the recording. Authority for session credits and technical contributions. TYPE: `str`
`FAN`	Community member / fan contributor. Valuable for crowd-sourced corrections but requires corroboration. Lowest weight. TYPE: `str`

EvidenceTypeEnum ¶

Bases: StrEnum

Evidence types supporting feedback corrections.

When a reviewer submits a FeedbackCard with corrections, they must indicate what evidence supports the correction. Evidence type affects the credibility weighting of the correction during calibration updates.

ATTRIBUTE	DESCRIPTION
`LINER_NOTES`	Physical or digital liner notes from the release packaging. Strong documentary evidence. TYPE: `str`
`MEMORY`	Personal recollection of the reviewer (e.g., artist remembering who played on a session). Subject to recall bias. TYPE: `str`
`DOCUMENT`	Contractual or legal document (e.g., publishing agreement, session contract). Strongest documentary evidence. TYPE: `str`
`SESSION_NOTES`	Studio session notes or recording logs. Strong evidence for engineering and performance credits. TYPE: `str`
`OTHER`	Other evidence type not covered above. Requires free-text explanation in the `FeedbackCard.free_text` field. TYPE: `str`

PermissionTypeEnum ¶

Bases: StrEnum

Permission types for the MCP Permission Patchbay.

Defines the universe of machine-readable permission queries that AI platforms and other consumers can issue via MCP. The taxonomy is hierarchical: AI_TRAINING is a broad category with AI_TRAINING_COMPOSITION, AI_TRAINING_RECORDING, and AI_TRAINING_STYLE as finer-grained sub-permissions.

See Teikari (2026), section 7, for the Permission Patchbay design.

ATTRIBUTE	DESCRIPTION
`STREAM`	Permission to stream the recording. TYPE: `str`
`DOWNLOAD`	Permission to download the recording for offline use. TYPE: `str`
`SYNC_LICENSE`	Synchronisation license (music paired with visual media). TYPE: `str`
`AI_TRAINING`	Broad permission for AI model training on any aspect of the work. TYPE: `str`
`AI_TRAINING_COMPOSITION`	AI training specifically on the compositional elements (melody, harmony, structure). TYPE: `str`
`AI_TRAINING_RECORDING`	AI training specifically on the recording (audio signal, mix, production qualities). TYPE: `str`
`AI_TRAINING_STYLE`	AI training on stylistic elements (timbre, groove, aesthetic). TYPE: `str`
`DATASET_INCLUSION`	Inclusion in a published research or training dataset. TYPE: `str`
`VOICE_CLONING`	Use of vocal performance for voice cloning / synthesis. TYPE: `str`
`STYLE_LEARNING`	Learning artistic style for generative imitation. TYPE: `str`
`LYRICS_IN_CHATBOTS`	Reproduction of lyrics in chatbot / LLM responses. TYPE: `str`
`COVER_VERSIONS`	Permission to create and distribute cover versions. TYPE: `str`
`REMIX`	Permission to create remixes of the recording. TYPE: `str`
`SAMPLE`	Permission to sample portions of the recording. TYPE: `str`
`DERIVATIVE_WORK`	Broad permission for any derivative work not covered above. TYPE: `str`

PermissionValueEnum ¶

Bases: StrEnum

Permission response values for MCP consent queries.

Each permission entry in a PermissionBundle resolves to one of these values. The values form a spectrum from unconditional denial to unconditional allowance, with conditional variants in between.

ATTRIBUTE	DESCRIPTION
`ALLOW`	Unconditional permission granted. TYPE: `str`
`DENY`	Permission explicitly denied. No exceptions. TYPE: `str`
`ASK`	Permission not pre-determined; the requester must contact the rights holder for case-by-case approval. TYPE: `str`
`ALLOW_WITH_ATTRIBUTION`	Permission granted on condition that proper attribution is included. Requires `PermissionEntry.attribution_requirement` to specify the required attribution text. TYPE: `str`
`ALLOW_WITH_ROYALTY`	Permission granted on condition of royalty payment. Requires `PermissionEntry.royalty_rate` to specify the rate. TYPE: `str`

PermissionScopeEnum ¶

Bases: StrEnum

Permission scope levels defining granularity of consent.

Permissions can be set at different levels of granularity, from an entire catalog down to a single work. Broader scopes act as defaults that can be overridden by narrower scopes.

ATTRIBUTE	DESCRIPTION
`CATALOG`	Applies to the entire catalog of the rights holder. Broadest scope. When scope is CATALOG, `scope_entity_id` must be None. TYPE: `str`
`RELEASE`	Applies to a specific release (album, EP, single). Requires `scope_entity_id` pointing to the release entity. TYPE: `str`
`RECORDING`	Applies to a specific recording. Requires `scope_entity_id` pointing to the recording entity. TYPE: `str`
`WORK`	Applies to a specific musical work (composition). Requires `scope_entity_id` pointing to the work entity. TYPE: `str`

DelegationRoleEnum ¶

Bases: StrEnum

Delegation chain roles in the permission hierarchy.

A PermissionBundle may include a delegation chain showing who granted permission authority to whom. This enables audit trails for permission provenance (e.g., artist -> manager -> label -> distributor).

ATTRIBUTE	DESCRIPTION
`OWNER`	Original rights holder (typically the artist or songwriter). Root of the delegation chain. TYPE: `str`
`MANAGER`	Artist manager or business representative acting on behalf of the owner. TYPE: `str`
`LABEL`	Record label holding master recording rights via contract. TYPE: `str`
`DISTRIBUTOR`	Digital distributor handling platform delivery. Typically the outermost link in the delegation chain. TYPE: `str`

PipelineFeedbackTypeEnum ¶

Bases: StrEnum

Pipeline feedback signal types for continuous improvement.

These are reverse-flow signals between pipelines, enabling the system to self-correct. For example, the Attribution Engine can signal back to Entity Resolution that its confidence estimates were miscalibrated, or the API layer can signal a dispute.

ATTRIBUTE	DESCRIPTION
`REFETCH`	Signal from Entity Resolution to ETL: "data from source X is consistently wrong or stale, re-fetch from the source." TYPE: `str`
`RECALIBRATE`	Signal from Attribution Engine to Entity Resolution: "resolution confidence was miscalibrated; predicted confidence differs significantly from actual accuracy." TYPE: `str`
`DISPUTE`	Signal from API/Chat to Attribution Engine: "a user or rights holder has disputed this attribution; re-evaluate." TYPE: `str`
`STALE`	Signal from any pipeline: "this record has not been refreshed within its expected freshness window." TYPE: `str`

UncertaintySourceEnum ¶

Bases: StrEnum

Uncertainty source taxonomy based on UProp (Duan 2025).

Classifies the origin of uncertainty in confidence estimates. The intrinsic/extrinsic decomposition (Duan 2025, arXiv:2506.17419) is the primary axis; aleatoric/epistemic is the classical secondary axis for compatibility with standard ML uncertainty literature.

ATTRIBUTE	DESCRIPTION
`INTRINSIC`	Intrinsic uncertainty arising from noise in the input data itself (e.g., conflicting metadata across sources, ambiguous artist names). TYPE: `str`
`EXTRINSIC`	Extrinsic uncertainty arising from the model or pipeline (e.g., embedding model limitations, resolution algorithm edge cases). TYPE: `str`
`ALEATORIC`	Irreducible uncertainty inherent in the data-generating process. Cannot be reduced by collecting more data. TYPE: `str`
`EPISTEMIC`	Reducible uncertainty due to limited knowledge or data. Can be reduced by collecting more evidence or better models. TYPE: `str`

UncertaintyDimensionEnum ¶

Bases: StrEnum

4-dimensional uncertainty framework (Liu 2025, arXiv:2503.15850).

Orthogonal to the intrinsic/extrinsic decomposition, this framework decomposes uncertainty along the information processing pipeline: from input through reasoning to final prediction.

ATTRIBUTE	DESCRIPTION
`INPUT`	Uncertainty in the input data (noise, missing fields, ambiguity). Maps to `StepUncertainty.input_uncertainty`. TYPE: `str`
`REASONING`	Uncertainty in the reasoning/inference process (e.g., entity resolution logic, LLM chain-of-thought). Maps to `StepUncertainty.reasoning_uncertainty`. TYPE: `str`
`PARAMETER`	Uncertainty in model parameters (e.g., embedding model weights, fuzzy matching thresholds). Maps to `StepUncertainty.parameter_uncertainty`. TYPE: `str`
`PREDICTION`	Uncertainty in the final prediction/output (e.g., the confidence score itself). Maps to `StepUncertainty.prediction_uncertainty`. TYPE: `str`

ConfidenceMethodEnum ¶

Bases: StrEnum

Methods used to produce confidence scores.

Each StepUncertainty records which method was used to generate its confidence estimate. Methods vary in cost, reliability, and calibration quality. See Teikari (2026), section 5, for the confidence scoring framework.

ATTRIBUTE	DESCRIPTION
`SELF_REPORT`	Source-reported confidence (e.g., MusicBrainz data quality rating). Cheapest but least calibrated. TYPE: `str`
`MULTI_SAMPLE`	Multiple-sample consistency (e.g., querying an LLM multiple times and measuring agreement). TYPE: `str`
`LOGPROB`	Token log-probability from an LLM. Fast but requires logprob API access. TYPE: `str`
`ENSEMBLE`	Ensemble of multiple models or methods. More expensive but better calibrated than single-model approaches. TYPE: `str`
`CONFORMAL`	Conformal prediction providing coverage guarantees. Produces prediction sets rather than point estimates. TYPE: `str`
`SOURCE_WEIGHTED`	Weighted average across data sources based on historical reliability (Yanez 2025 approach). TYPE: `str`
`HUMAN_RATED`	Human expert rating. Highest authority but most expensive and slowest. TYPE: `str`
`HTC`	Holistic Trajectory Calibration (Zhang 2026, arXiv:2601.15778). Uses trajectory-level features across the full pipeline for calibration. TYPE: `str`

CalibrationStatusEnum ¶

Bases: StrEnum

Calibration status of a confidence score.

Indicates whether a confidence score has been post-hoc calibrated (e.g., via Platt scaling or isotonic regression) to ensure that stated confidence matches empirical accuracy.

ATTRIBUTE	DESCRIPTION
`CALIBRATED`	Score has been calibrated against a held-out calibration set. `CalibrationMetadata.expected_calibration_error` is meaningful. TYPE: `str`
`UNCALIBRATED`	Score is raw/uncalibrated. May exhibit over- or under-confidence. TYPE: `str`
`PENDING`	Calibration is pending (insufficient calibration data collected so far). Score should be treated as uncalibrated. TYPE: `str`

ConfidenceTrendEnum ¶

Bases: StrEnum

Confidence trend across pipeline steps (Zhang 2026).

Characterises the trajectory of confidence scores as a record passes through the pipeline. Used by TrajectoryCalibration for HTC-based calibration. See Zhang (2026, arXiv:2601.15778).

ATTRIBUTE	DESCRIPTION
`INCREASING`	Confidence monotonically increases across steps. Typical when multiple corroborating sources are found. TYPE: `str`
`DECREASING`	Confidence monotonically decreases. May indicate conflicting evidence discovered during resolution. TYPE: `str`
`STABLE`	Confidence remains approximately constant. Typical for records with strong initial evidence (e.g., exact ID match). TYPE: `str`
`VOLATILE`	Confidence oscillates across steps. May indicate unstable resolution or contradictory evidence. Often triggers `needs_review = True`. TYPE: `str`

AttributionMethodEnum ¶

Bases: StrEnum

Training data attribution (TDA) methods.

Future-readiness stubs for commercial landscape parity with Musical AI, Sureel, ProRata, and Sony's influence-function approach. These methods attempt to quantify how much a specific training example influenced a generative model's output.

ATTRIBUTE	DESCRIPTION
`TRAINING_TIME_INFLUENCE`	Influence measured at training time (e.g., data Shapley values, TracIn). Requires access to training checkpoints. TYPE: `str`
`UNLEARNING_BASED`	Influence measured via machine unlearning (retrain-without and compare). Expensive but theoretically sound. TYPE: `str`
`INFLUENCE_FUNCTIONS`	Classical influence functions (Koh & Liang 2017). Approximates leave-one-out retraining via Hessian-vector products. TYPE: `str`
`EMBEDDING_SIMILARITY`	Cosine similarity in embedding space between source and generated content. Cheapest but least rigorous. TYPE: `str`
`WATERMARK_DETECTION`	Detection of embedded watermarks in generated content that trace back to training data (e.g., SynthID, AudioSeal). TYPE: `str`
`INFERENCE_TIME_CONDITIONING`	Attribution via inference-time conditioning or prompting (e.g., Musical AI's approach of conditioning generation on a known work). TYPE: `str`

RightsTypeEnum ¶

Bases: StrEnum

Music rights types distinguishing compositional vs recording rights.

Future-readiness stub. In music licensing, rights are split between the composition (publishing) side and the recording (master) side. This distinction is critical for AI training attribution: a model may learn from the composition, the recording, or both.

Based on Sureel patent and LANDR rights management approaches.

ATTRIBUTE	DESCRIPTION
`MASTER_RECORDING`	Rights in the specific audio recording (sound recording copyright). Typically held by the label or artist. TYPE: `str`
`COMPOSITION_PUBLISHING`	Rights in the underlying composition (musical work copyright). Typically held by the publisher or songwriter. TYPE: `str`
`PERFORMANCE`	Performance rights (public performance, broadcast). Managed by PROs (ASCAP, BMI, PRS, GEMA, etc.). TYPE: `str`
`MECHANICAL`	Mechanical reproduction rights (physical copies, downloads, interactive streams). TYPE: `str`
`SYNC`	Synchronisation rights (pairing music with visual media). TYPE: `str`

MediaTypeEnum ¶

Bases: StrEnum

Multi-modal attribution media types.

Future-readiness stub for multi-modal training data attribution. While this scaffold focuses on audio, the Sureel and ProRata approaches are modality-agnostic and support cross-modal attribution.

ATTRIBUTE	DESCRIPTION
`AUDIO`	Audio content (waveform, spectrogram). Primary modality for this scaffold. TYPE: `str`
`IMAGE`	Image content (album art, spectrograms as images). TYPE: `str`
`VIDEO`	Video content (music videos, live performances). TYPE: `str`
`TEXT`	Text content (lyrics, liner notes, reviews). TYPE: `str`
`SYMBOLIC_MUSIC`	Symbolic music representations (MIDI, MusicXML, ABC notation). TYPE: `str`
`MULTIMODAL`	Content spanning multiple modalities simultaneously. TYPE: `str`

CertificationTypeEnum ¶

Bases: StrEnum

External certification and compliance attestation types.

Future-readiness stub for third-party certifications that validate an AI system's training data practices. These certifications are attached to ComplianceAttestation records.

ATTRIBUTE	DESCRIPTION
`FAIRLY_TRAINED_LICENSED`	Fairly Trained certification indicating all training data was licensed or in the public domain. TYPE: `str`
`C2PA_PROVENANCE`	C2PA (Coalition for Content Provenance and Authenticity) provenance manifest attached to generated content. TYPE: `str`
`EU_AI_ACT_COMPLIANT`	Self-declared or audited compliance with EU AI Act requirements for general-purpose AI (GPAI) models. TYPE: `str`
`CMO_APPROVED`	Approved by a Collective Management Organisation (CMO) such as GEMA, PRS, or ASCAP for training data usage. TYPE: `str`

WatermarkTypeEnum ¶

Bases: StrEnum

Audio watermark types for provenance tracking.

Future-readiness stub for audio watermarking systems that embed imperceptible identifiers in audio signals. Watermarks enable post-hoc attribution of AI-generated content back to training data or generation source.

ATTRIBUTE	DESCRIPTION
`SYNTHID`	Google DeepMind's SynthID audio watermarking. Embeds identifiers in spectrogram space. TYPE: `str`
`AUDIOSEAL`	Meta's AudioSeal. Localised audio watermarking with detector that identifies watermarked segments. TYPE: `str`
`WAVMARK`	WavMark academic watermarking approach. Embeds in the waveform domain. TYPE: `str`
`DIGIMARC`	Digimarc commercial watermarking. Used in broadcast monitoring and content identification. TYPE: `str`

RevenueModelEnum ¶

Bases: StrEnum

Revenue sharing models for AI-generated music attribution.

Future-readiness stub for commercial revenue distribution models. Different platforms use different approaches to compensate rights holders whose works contributed to AI training.

ATTRIBUTE	DESCRIPTION
`FLAT_FEE_UPFRONT`	One-time flat fee paid for training data licensing (e.g., LANDR model for stem packs). TYPE: `str`
`PRO_RATA_MONTHLY`	Monthly pro-rata distribution based on catalog size or usage (e.g., streaming royalty model applied to AI training). TYPE: `str`
`PER_GENERATION`	Payment per generation event that uses the rights holder's contribution (e.g., Kits AI voice model usage). TYPE: `str`
`INFLUENCE_BASED`	Payment proportional to measured influence on generated output (e.g., Musical AI / Sureel approach using TDA methods). TYPE: `str`

RegulatoryFrameworkEnum ¶

Bases: StrEnum

Applicable regulatory and governance frameworks.

ISO 42001 defines internal AI governance roles; EU AI Act defines supply chain liability actors. They have zero terminological overlap and must be tracked separately. See Teikari (2026), section 8, for the regulatory mapping.

ATTRIBUTE	DESCRIPTION
`ISO_42001`	ISO/IEC 42001 AI Management System standard. Defines internal governance roles (Top Management, AI System Owner, Internal Audit). TYPE: `str`
`EU_AI_ACT`	EU Artificial Intelligence Act (Regulation 2024/1689). Defines risk categories and obligations for AI system providers/deployers. TYPE: `str`
`GPAI_CODE_OF_PRACTICE`	General-Purpose AI Model Code of Practice (July 2025). Specifies transparency and copyright compliance requirements for GPAI models. TYPE: `str`
`DSM_DIRECTIVE`	EU Digital Single Market Directive (2019/790). Art. 3-4 govern text-and-data mining exceptions and opt-out mechanisms. TYPE: `str`
`ESPR_DPP`	EU Ecodesign for Sustainable Products Regulation / Digital Product Passport. Cross-domain provenance framework. TYPE: `str`
`GDPR`	EU General Data Protection Regulation. Relevant when attribution records contain personal data (artist identities, reviewer info). TYPE: `str`

ComplianceActorEnum ¶

Bases: StrEnum

EU AI Act supply chain actors (Art. 3).

These are distinct from ISO 42001 internal governance roles (Top Management, AI System Owner, Internal Audit). An organization may simultaneously hold multiple actor classifications across different AI systems.

ATTRIBUTE	DESCRIPTION
`PROVIDER`	Entity that develops or has an AI system developed and places it on the market or puts it into service (Art. 3(3)). TYPE: `str`
`DEPLOYER`	Entity that uses an AI system under its authority (Art. 3(4)). The music platform using the attribution system. TYPE: `str`
`AUTHORISED_REPRESENTATIVE`	Entity established in the EU mandated by a non-EU provider to act on their behalf (Art. 3(5)). TYPE: `str`
`IMPORTER`	Entity established in the EU that places an AI system from a third country on the EU market (Art. 3(6)). TYPE: `str`
`DISTRIBUTOR`	Entity in the supply chain that makes an AI system available on the EU market (Art. 3(7)). TYPE: `str`
`PRODUCT_MANUFACTURER`	Manufacturer of a product that integrates an AI system as a safety component (Art. 3(8)). TYPE: `str`

TdmReservationMethodEnum ¶

Bases: StrEnum

Text-and-data-mining rights reservation methods.

Under EU DSM Directive Art. 4, copyright holders can opt out of TDM via machine-readable reservation. The GPAI Code of Practice (July 2025) requires providers to respect robots.txt and emerging protocols. Music has a structural gap: robots.txt is web-only and does not cover audio content accessed via APIs or streaming platforms.

See Teikari (2026), section 7, for the music-specific gap analysis.

ATTRIBUTE	DESCRIPTION
`ROBOTS_TXT`	Standard robots.txt file on web servers. Web-only; does not cover audio files served via APIs or streaming platforms. TYPE: `str`
`LLMS_TXT`	Emerging llms.txt protocol for specifying LLM training permissions at the domain level. TYPE: `str`
`MACHINE_READABLE_TAG`	HTML meta tags or HTTP headers expressing TDM reservation (e.g., `<meta name="tdm-reservation" content="1">`). TYPE: `str`
`RIGHTS_RESERVATION_API`	Programmatic API for querying rights reservation status. More flexible than static files but requires infrastructure. TYPE: `str`
`MCP_PERMISSION_QUERY`	Model Context Protocol permission query. The approach advocated by this scaffold: machine-readable consent queries via MCP tools. TYPE: `str`

Normalized Record (ETL Output)¶

normalized ¶

NormalizedRecord boundary object schema (BO-1).

Output of the Data Engineering (ETL) pipeline. A single music entity normalized from one external source. Multiple NormalizedRecord instances for the same real-world entity (from different sources) feed into the Entity Resolution pipeline, which merges them into a ResolvedEntity.

This module defines the first boundary object in the five-pipeline architecture. All ETL extractors -- regardless of source format -- produce NormalizedRecord instances with a common schema, enabling uniform downstream processing.

See Also

music_attribution.schemas.resolved : The next boundary object in the pipeline. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5.

IdentifierBundle ¶

Bases: BaseModel

Standard music industry identifiers bundle.

Collects all known standard identifiers for a music entity. At the A2/A3 assurance levels, at least one identifier must be present for machine-sourced records (MusicBrainz, Discogs, AcoustID). These identifiers are the primary key for exact-match entity resolution.

ATTRIBUTE	DESCRIPTION
`isrc`	International Standard Recording Code. 12-character alphanumeric code uniquely identifying a specific recording (e.g., `"GBAYE0601498"`). Assigned by the IFPI. TYPE: `str or None`
`iswc`	International Standard Musical Work Code. Identifies the underlying composition (e.g., `"T-070.238.867-3"`). Assigned by CISAC. TYPE: `str or None`
`isni`	International Standard Name Identifier. 16-digit identifier for public identities of parties (e.g., `"0000000121032683"` for Imogen Heap). Assigned by the ISNI International Agency. TYPE: `str or None`
`ipi`	Interested Party Information code. 9-11 digit code identifying rights holders in collecting society databases. TYPE: `str or None`
`mbid`	MusicBrainz Identifier. UUID assigned by MusicBrainz to any entity in their database. TYPE: `str or None`
`discogs_id`	Discogs numeric entity ID. Integer identifier in the Discogs database. TYPE: `int or None`
`acoustid`	AcoustID identifier. UUID derived from audio fingerprint (Chromaprint) matching. TYPE: `str or None`

Examples:

>>> bundle = IdentifierBundle(
...     isrc="GBAYE0601498",
...     mbid="a74b1b7f-71a5-4011-9441-d0b5e4122711",
... )
>>> bundle.has_any()
True
>>> IdentifierBundle().has_any()
False

has_any ¶

has_any() -> bool

Check if at least one identifier is set.

RETURNS	DESCRIPTION
`bool`	True if any identifier field is not None.

Source code in src/music_attribution/schemas/normalized.py

def has_any(self) -> bool:
    """Check if at least one identifier is set.

    Returns
    -------
    bool
        True if any identifier field is not None.
    """
    return any(
        v is not None
        for v in (self.isrc, self.iswc, self.isni, self.ipi, self.mbid, self.discogs_id, self.acoustid)
    )

SourceMetadata ¶

Bases: BaseModel

Typed source-specific metadata attached to a NormalizedRecord.

Contains supplementary information that varies by source but follows a common schema. Fields that do not apply to a particular source are left as their defaults (None or empty list).

ATTRIBUTE	DESCRIPTION
`roles`	Credit roles reported by the source (free-text, not yet mapped to `CreditRoleEnum`). Examples: `["performer", "producer"]`. TYPE: `list of str`
`release_date`	Release date as reported by the source. String format varies (ISO 8601 preferred, but partial dates like `"2005"` are common in MusicBrainz). TYPE: `str or None`
`release_country`	ISO 3166-1 alpha-2 country code for the release territory (e.g., `"GB"`, `"US"`). TYPE: `str or None`
`genres`	Genre tags reported by the source. Free-text, not standardised across sources. TYPE: `list of str`
`duration_ms`	Track duration in milliseconds. May differ between sources due to different mastering or silence handling. TYPE: `int or None`
`track_number`	Track position within the release medium. TYPE: `int or None`
`medium_format`	Physical or digital medium format (e.g., `"CD"`, `"Vinyl"`, `"Digital Media"`). TYPE: `str or None`
`language`	ISO 639-1 language code for lyrics/vocals (e.g., `"en"`). TYPE: `str or None`
`extras`	Catch-all for source-specific fields that do not map to the common schema. Keys and values are both strings. TYPE: `dict of str to str`

Examples:

>>> meta = SourceMetadata(
...     roles=["performer", "songwriter"],
...     release_date="2005-10-17",
...     release_country="GB",
...     genres=["electronic", "art pop"],
...     duration_ms=265000,
... )

Relationship ¶

Bases: BaseModel

Link between entities within a single data source.

Represents a directed edge from the parent NormalizedRecord to another entity identified by its source-specific ID. These relationships are source-local; cross-source relationship resolution happens in the Entity Resolution pipeline, producing ResolvedRelationship objects.

ATTRIBUTE	DESCRIPTION
`relationship_type`	The type of relationship (e.g., PERFORMED, WROTE, PRODUCED). TYPE: `RelationshipTypeEnum`
`target_source`	The data source of the target entity. TYPE: `SourceEnum`
`target_source_id`	Source-specific identifier of the target entity (e.g., a MusicBrainz MBID or Discogs numeric ID as string). TYPE: `str`
`target_entity_type`	The entity type of the target (e.g., ARTIST, RECORDING). TYPE: `EntityTypeEnum`
`attributes`	Additional relationship attributes (e.g., `{"instrument": "piano"}`, `{"begin_date": "2005-01"}`). TYPE: `dict of str to str`

Examples:

>>> rel = Relationship(
...     relationship_type=RelationshipTypeEnum.PERFORMED,
...     target_source=SourceEnum.MUSICBRAINZ,
...     target_source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
...     target_entity_type=EntityTypeEnum.ARTIST,
...     attributes={"instrument": "vocals"},
... )

NormalizedRecord ¶

Bases: BaseModel

ETL output: normalized music metadata from a single external source.

The NormalizedRecord is the first boundary object (BO-1) in the five-pipeline architecture. All ETL extractors produce NormalizedRecord instances regardless of their source format. Multiple records for the same real-world entity (from different sources) are merged by the Entity Resolution pipeline into a ResolvedEntity.

ATTRIBUTE	DESCRIPTION
`schema_version`	Semantic version of the NormalizedRecord schema. Defaults to `"1.0.0"`. Used for forward/backward compatibility checks. TYPE: `str`
`record_id`	Unique identifier for this record. Auto-generated UUIDv4. TYPE: `UUID`
`source`	Which data source provided this record. TYPE: `SourceEnum`
`source_id`	Source-specific identifier (e.g., MusicBrainz MBID, Discogs release ID as string). TYPE: `str`
`entity_type`	The type of music entity this record represents. TYPE: `EntityTypeEnum`
`canonical_name`	Primary name of the entity as reported by the source. Must be non-empty after whitespace stripping. TYPE: `str`
`alternative_names`	Alternative names, aliases, or transliterations. Used during fuzzy entity resolution. TYPE: `list of str`
`identifiers`	Standard music industry identifiers (ISRC, ISWC, ISNI, etc.). Machine sources (MusicBrainz, Discogs, AcoustID) must provide at least one identifier. TYPE: `IdentifierBundle`
`metadata`	Source-specific metadata (genres, release date, duration, etc.). TYPE: `SourceMetadata`
`relationships`	Links to other entities within the same source. TYPE: `list of Relationship`
`fetch_timestamp`	UTC timestamp when this record was fetched from the source. Must be timezone-aware and not more than 60 seconds in the future (to catch clock skew). TYPE: `datetime`
`source_confidence`	Source-reported confidence in the data, range [0.0, 1.0]. 0.0 = no confidence data available; 1.0 = verified by authority. TYPE: `float`
`raw_payload`	Original API response preserved for debugging and re-processing. May be None if raw data is not retained. TYPE: `dict or None`

Examples:

>>> from datetime import datetime, UTC
>>> record = NormalizedRecord(
...     source=SourceEnum.MUSICBRAINZ,
...     source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
...     entity_type=EntityTypeEnum.RECORDING,
...     canonical_name="Hide and Seek",
...     identifiers=IdentifierBundle(isrc="GBAYE0601498"),
...     fetch_timestamp=datetime.now(UTC),
...     source_confidence=0.87,
... )

See Also

ResolvedEntity : The next boundary object produced by Entity Resolution.

validate_canonical_name `classmethod` ¶

validate_canonical_name(v: str) -> str

Canonical name must be non-empty after stripping.

Source code in src/music_attribution/schemas/normalized.py

@field_validator("canonical_name")
@classmethod
def validate_canonical_name(cls, v: str) -> str:
    """Canonical name must be non-empty after stripping."""
    if not v.strip():
        msg = "canonical_name must be non-empty after stripping whitespace"
        raise ValueError(msg)
    return v.strip()

validate_fetch_timestamp `classmethod` ¶

validate_fetch_timestamp(v: datetime) -> datetime

Fetch timestamp must be timezone-aware and not far in the future.

Source code in src/music_attribution/schemas/normalized.py

@field_validator("fetch_timestamp")
@classmethod
def validate_fetch_timestamp(cls, v: datetime) -> datetime:
    """Fetch timestamp must be timezone-aware and not far in the future."""
    if v.tzinfo is None:
        msg = "fetch_timestamp must be timezone-aware (UTC)"
        raise ValueError(msg)
    max_future = datetime.now(UTC) + timedelta(seconds=60)
    if v > max_future:
        msg = "fetch_timestamp must not be more than 60 seconds in the future"
        raise ValueError(msg)
    return v

validate_identifiers_for_machine_sources ¶

validate_identifiers_for_machine_sources() -> (
    NormalizedRecord
)

Machine sources require at least one identifier.

Source code in src/music_attribution/schemas/normalized.py

@model_validator(mode="after")
def validate_identifiers_for_machine_sources(self) -> NormalizedRecord:
    """Machine sources require at least one identifier."""
    machine_sources = {SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS, SourceEnum.ACOUSTID}
    if self.source in machine_sources and not self.identifiers.has_any():
        msg = f"Source {self.source} requires at least one identifier in IdentifierBundle"
        raise ValueError(msg)
    return self

Resolved Entity (Resolution Output)¶

resolved ¶

ResolvedEntity boundary object schema (BO-2).

Output of the Entity Resolution pipeline. A unified entity that merges multiple NormalizedRecord instances from different sources into a single canonical entity with resolution confidence and assurance level.

The ResolvedEntity is the second boundary object in the five-pipeline architecture. It carries forward the provenance of every source that contributed to it, enabling downstream attribution scoring to weight sources by reliability.

See Also

music_attribution.schemas.normalized : The preceding boundary object. music_attribution.schemas.attribution : The next boundary object. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5.

SourceReference ¶

Bases: BaseModel

Reference to a contributing NormalizedRecord.

Links a ResolvedEntity back to the specific NormalizedRecord that contributed to it, preserving full provenance. The agreement score measures how well this source's data aligns with the resolved consensus.

ATTRIBUTE	DESCRIPTION
`record_id`	UUID of the contributing `NormalizedRecord`. TYPE: `UUID`
`source`	Which data source provided the record. TYPE: `SourceEnum`
`source_id`	Source-specific identifier of the record. TYPE: `str`
`agreement_score`	How well this source agrees with the resolved consensus, range [0.0, 1.0]. 1.0 = perfect agreement on all fields; 0.0 = complete disagreement. TYPE: `float`

Examples:

>>> ref = SourceReference(
...     record_id=uuid.uuid4(),
...     source=SourceEnum.MUSICBRAINZ,
...     source_id="a74b1b7f-71a5-4011-9441-d0b5e4122711",
...     agreement_score=0.95,
... )

ResolutionDetails ¶

Bases: BaseModel

Per-method confidence breakdown for entity resolution.

Records the confidence contribution from each resolution method that was attempted. Only populated fields were actually used; None means that method was not applied. This enables post-hoc analysis of which methods are most effective for different entity types.

ATTRIBUTE	DESCRIPTION
`string_similarity`	Confidence from fuzzy string matching (Jaro-Winkler, Levenshtein), range [0.0, 1.0]. None if not attempted. TYPE: `float or None`
`embedding_similarity`	Confidence from semantic embedding similarity (cosine distance), range [0.0, 1.0]. None if not attempted. TYPE: `float or None`
`graph_path_confidence`	Confidence from graph-based resolution (path length, co-occurrence patterns), range [0.0, 1.0]. None if not attempted. TYPE: `float or None`
`llm_confidence`	Confidence from LLM-assisted resolution, range [0.0, 1.0]. None if not attempted. TYPE: `float or None`
`matched_identifiers`	Names of identifiers that matched exactly (e.g., `["isrc", "mbid"]`). Empty if no exact matches. TYPE: `list of str`

Examples:

>>> details = ResolutionDetails(
...     string_similarity=0.92,
...     matched_identifiers=["isrc"],
... )

ResolvedRelationship ¶

Bases: BaseModel

Resolved cross-entity relationship link.

Unlike source-local Relationship objects in NormalizedRecord, a ResolvedRelationship links two ResolvedEntity instances and is backed by one or more corroborating data sources.

ATTRIBUTE	DESCRIPTION
`target_entity_id`	UUID of the target `ResolvedEntity`. TYPE: `UUID`
`relationship_type`	The type of relationship (e.g., PERFORMED, WROTE). TYPE: `RelationshipTypeEnum`
`confidence`	Confidence in this relationship, range [0.0, 1.0]. Higher when multiple sources corroborate the link. TYPE: `float`
`supporting_sources`	Data sources that corroborate this relationship. More sources generally means higher confidence. TYPE: `list of SourceEnum`

Examples:

>>> rel = ResolvedRelationship(
...     target_entity_id=uuid.uuid4(),
...     relationship_type=RelationshipTypeEnum.PERFORMED,
...     confidence=0.92,
...     supporting_sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
... )

Conflict ¶

Bases: BaseModel

Unresolved disagreement between data sources.

When entity resolution encounters contradictory information from different sources for the same field, it records a Conflict rather than silently choosing one value. Conflicts with severity HIGH or CRITICAL trigger needs_review = True on the parent ResolvedEntity.

ATTRIBUTE	DESCRIPTION
`field`	Name of the field in conflict (e.g., `"canonical_name"`, `"release_date"`). TYPE: `str`
`values`	Mapping of source name to its reported value. Keys are source identifiers (e.g., `"MUSICBRAINZ"`), values are the conflicting field values. TYPE: `dict of str to str`
`severity`	How severe the disagreement is, from LOW (auto-resolvable) to CRITICAL (blocks attribution). TYPE: `ConflictSeverityEnum`

Examples:

>>> conflict = Conflict(
...     field="canonical_name",
...     values={"MUSICBRAINZ": "Imogen Heap", "DISCOGS": "I. Heap"},
...     severity=ConflictSeverityEnum.LOW,
... )

ResolvedEntity ¶

Bases: BaseModel

Unified entity resolved from multiple data sources.

The ResolvedEntity is the second boundary object (BO-2) in the five-pipeline architecture. It is produced by the Entity Resolution pipeline and consumed by the Attribution Engine. Each instance represents a single real-world music entity (artist, recording, work, etc.) with a canonical identity established by merging one or more NormalizedRecord instances.

ATTRIBUTE	DESCRIPTION
`schema_version`	Semantic version of the ResolvedEntity schema. Defaults to `"1.0.0"`. TYPE: `str`
`entity_id`	Unique identifier for this resolved entity. Auto-generated UUIDv4. TYPE: `UUID`
`entity_type`	The type of music entity (RECORDING, WORK, ARTIST, etc.). TYPE: `EntityTypeEnum`
`canonical_name`	Best-consensus name for the entity, chosen from contributing sources by the resolution algorithm. TYPE: `str`
`alternative_names`	All other names/aliases from contributing sources, used for future matching and display. TYPE: `list of str`
`identifiers`	Merged identifier bundle combining identifiers from all contributing sources. TYPE: `IdentifierBundle`
`source_records`	References to all `NormalizedRecord` instances that were merged into this entity. Must contain at least one. TYPE: `list of SourceReference`
`resolution_method`	Primary method used to resolve/merge the source records. TYPE: `ResolutionMethodEnum`
`resolution_confidence`	Overall confidence in the resolution, range [0.0, 1.0]. This is the resolution pipeline's assessment of how likely it is that all merged records truly refer to the same entity. TYPE: `float`
`resolution_details`	Per-method confidence breakdown showing which methods contributed and their individual confidence scores. TYPE: `ResolutionDetails`
`assurance_level`	A0-A3 assurance level determined by the number and quality of corroborating sources. See Teikari (2026), section 6. TYPE: `AssuranceLevelEnum`
`relationships`	Cross-entity links resolved from source-local relationships. TYPE: `list of ResolvedRelationship`
`conflicts`	Unresolved disagreements between sources. May trigger human review if severity is HIGH or CRITICAL. TYPE: `list of Conflict`
`needs_review`	Flag indicating this entity requires human review before attribution scoring proceeds. TYPE: `bool`
`review_reason`	Human-readable explanation of why review is needed. Required when `needs_review` is True. TYPE: `str or None`
`merged_from`	If this entity was formed by merging previously separate `ResolvedEntity` instances, their IDs are listed here. TYPE: `list of uuid.UUID or None`
`resolved_at`	UTC timestamp when resolution was performed. Must be timezone-aware. TYPE: `datetime`

Examples:

>>> from datetime import datetime, UTC
>>> entity = ResolvedEntity(
...     entity_type=EntityTypeEnum.RECORDING,
...     canonical_name="Hide and Seek",
...     source_records=[
...         SourceReference(
...             record_id=uuid.uuid4(),
...             source=SourceEnum.MUSICBRAINZ,
...             source_id="abc-123",
...             agreement_score=0.95,
...         ),
...     ],
...     resolution_method=ResolutionMethodEnum.EXACT_ID,
...     resolution_confidence=0.98,
...     assurance_level=AssuranceLevelEnum.LEVEL_2,
...     resolved_at=datetime.now(UTC),
... )

See Also

NormalizedRecord : The preceding boundary object from ETL. AttributionRecord : The next boundary object from Attribution Engine.

validate_resolved_at `classmethod` ¶

validate_resolved_at(v: datetime) -> datetime

Resolved timestamp must be timezone-aware.

Source code in src/music_attribution/schemas/resolved.py

@field_validator("resolved_at")
@classmethod
def validate_resolved_at(cls, v: datetime) -> datetime:
    """Resolved timestamp must be timezone-aware."""
    if v.tzinfo is None:
        msg = "resolved_at must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_review_fields ¶

validate_review_fields() -> ResolvedEntity

If needs_review is True, review_reason must be provided.

Source code in src/music_attribution/schemas/resolved.py

@model_validator(mode="after")
def validate_review_fields(self) -> ResolvedEntity:
    """If needs_review is True, review_reason must be provided."""
    if self.needs_review and self.review_reason is None:
        msg = "review_reason must be non-None when needs_review is True"
        raise ValueError(msg)
    return self

Attribution Record (Engine Output)¶

attribution ¶

AttributionRecord boundary object schema (BO-3).

Output of the Attribution Engine pipeline. A complete attribution record for a musical work/recording with calibrated confidence scores, conformal prediction sets, and a full provenance chain.

The AttributionRecord is the third boundary object in the five-pipeline architecture and the primary output consumed by the API/MCP Server and Chat Interface pipelines. It carries the complete audit trail of how an attribution was constructed, enabling transparent confidence communication to end users.

See Also

music_attribution.schemas.resolved : The preceding boundary object. music_attribution.schemas.feedback : Reverse-flow feedback from users. music_attribution.schemas.uncertainty : Uncertainty decomposition models. Teikari, P. (2026). Music Attribution with Transparent Confidence, sections 5-6.

EventDetails `module-attribute` ¶

EventDetails = Annotated[
    FetchEventDetails
    | ResolveEventDetails
    | ScoreEventDetails
    | ReviewEventDetails
    | UpdateEventDetails
    | FeedbackEventDetails,
    Field(discriminator="type"),
]

Discriminated union of provenance event detail types.

Uses Pydantic's discriminator field (type) to deserialize into the correct detail class. Each variant corresponds to a ProvenanceEventTypeEnum value.

Credit ¶

Bases: BaseModel

Attribution credit for a single entity-role pair.

Represents one line item in the attribution: a specific entity (artist, producer, etc.) credited in a specific role on a musical work or recording. Each credit carries its own confidence score and assurance level, independent of the overall record.

ATTRIBUTE	DESCRIPTION
`entity_id`	UUID of the `ResolvedEntity` receiving this credit. TYPE: `UUID`
`entity_name`	Display name of the credited entity. Defaults to empty string; populated for API/UI convenience. TYPE: `str`
`role`	The role in which the entity is credited (e.g., PERFORMER, SONGWRITER, PRODUCER). TYPE: `CreditRoleEnum`
`role_detail`	Additional role detail not captured by the enum (e.g., `"lead vocals"`, `"bass guitar"`). TYPE: `str or None`
`confidence`	Confidence in this specific credit assignment, range [0.0, 1.0]. May differ from the overall record confidence. TYPE: `float`
`sources`	Data sources that corroborate this credit. More sources generally yield higher confidence. TYPE: `list of SourceEnum`
`assurance_level`	A0-A3 assurance level for this specific credit. TYPE: `AssuranceLevelEnum`

Examples:

>>> credit = Credit(
...     entity_id=uuid.uuid4(),
...     entity_name="Imogen Heap",
...     role=CreditRoleEnum.PERFORMER,
...     role_detail="lead vocals, keyboards",
...     confidence=0.95,
...     sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
...     assurance_level=AssuranceLevelEnum.LEVEL_2,
... )

ConformalSet ¶

Bases: BaseModel

Conformal prediction set at a specified coverage level.

Instead of a single point prediction for each credit role, conformal prediction produces a set of plausible roles that contains the true role with a guaranteed probability (the coverage level). Smaller sets indicate higher confidence. See Teikari (2026), section 5.

ATTRIBUTE	DESCRIPTION
`coverage_level`	Target coverage probability, range (0.0, 1.0) exclusive. Typical values: 0.90 (90% coverage) or 0.95 (95% coverage). TYPE: `float`
`prediction_sets`	Mapping of entity ID (as string) to the set of plausible roles for that entity. Smaller sets = more certain attribution. TYPE: `dict of str to list of CreditRoleEnum`
`set_sizes`	Mapping of entity ID to the cardinality of their prediction set. A set size of 1 means the role is unambiguous at the given coverage level. TYPE: `dict of str to int`
`marginal_coverage`	Observed marginal coverage on the calibration set, range [0.0, 1.0]. Should be close to `coverage_level` if well calibrated. TYPE: `float`
`calibration_error`	Absolute difference between `coverage_level` and `marginal_coverage`. Lower is better. Non-negative. TYPE: `float`
`calibration_method`	Name of the calibration method used (e.g., `"split_conformal"`, `"jackknife_plus"`). TYPE: `str`
`calibration_set_size`	Number of examples in the calibration set. Larger sets give tighter coverage guarantees. Non-negative. TYPE: `int`

Examples:

>>> conformal = ConformalSet(
...     coverage_level=0.90,
...     prediction_sets={"entity-uuid": [CreditRoleEnum.PERFORMER]},
...     set_sizes={"entity-uuid": 1},
...     marginal_coverage=0.91,
...     calibration_error=0.01,
...     calibration_method="split_conformal",
...     calibration_set_size=500,
... )

FetchEventDetails ¶

Bases: BaseModel

Details for FETCH provenance events.

Records metadata about a data fetch operation from an external source as part of the provenance chain.

ATTRIBUTE	DESCRIPTION
`type`	Discriminator field for the `EventDetails` union. Always `"fetch"`. TYPE: `Literal['fetch']`
`source`	The data source that was queried. TYPE: `SourceEnum`
`source_id`	Source-specific query identifier or endpoint. TYPE: `str`
`records_fetched`	Number of records returned by the fetch. Non-negative. TYPE: `int`
`rate_limited`	Whether the fetch was rate-limited by the source API. TYPE: `bool`

ResolveEventDetails ¶

Bases: BaseModel

Details for RESOLVE provenance events.

Records metadata about an entity resolution step, including the method used and the reduction ratio (input records to output entities).

ATTRIBUTE	DESCRIPTION
`type`	Discriminator field. Always `"resolve"`. TYPE: `Literal['resolve']`
`method`	Name of the resolution method or algorithm used. TYPE: `str`
`records_input`	Number of `NormalizedRecord` instances fed into resolution. Non-negative. TYPE: `int`
`entities_output`	Number of `ResolvedEntity` instances produced. Non-negative. Should be <= `records_input`. TYPE: `int`
`confidence_range`	(min, max) confidence range across all output entities. Defaults to `(0.0, 1.0)`. TYPE: `tuple of (float, float)`

ScoreEventDetails ¶

Bases: BaseModel

Details for SCORE provenance events.

Records a confidence scoring or recalibration step, showing how the confidence value changed.

ATTRIBUTE	DESCRIPTION
`type`	Discriminator field. Always `"score"`. TYPE: `Literal['score']`
`previous_confidence`	Confidence before this scoring step, range [0.0, 1.0]. None for the initial scoring event. TYPE: `float or None`
`new_confidence`	Confidence after this scoring step, range [0.0, 1.0]. TYPE: `float`
`scoring_method`	Name of the scoring/calibration method applied (e.g., `"source_weighted_average"`, `"platt_scaling"`). TYPE: `str`

ReviewEventDetails ¶

Bases: BaseModel

Details for REVIEW provenance events.

Records that a human reviewer examined the attribution and optionally applied corrections from a FeedbackCard.

ATTRIBUTE	DESCRIPTION
`type`	Discriminator field. Always `"review"`. TYPE: `Literal['review']`
`reviewer_id`	Identifier of the reviewer who performed the review. TYPE: `str`
`feedback_card_id`	UUID of the `FeedbackCard` that was applied. TYPE: `UUID`
`corrections_applied`	Number of field corrections accepted from the feedback card. Non-negative. Zero means the reviewer confirmed the record without changes. TYPE: `int`

UpdateEventDetails ¶

Bases: BaseModel

Details for UPDATE provenance events.

Records a version bump on the attribution record, including which fields changed and what triggered the update.

ATTRIBUTE	DESCRIPTION
`type`	Discriminator field. Always `"update"`. TYPE: `Literal['update']`
`previous_version`	Version number before this update. Minimum 1. TYPE: `int`
`new_version`	Version number after this update. Minimum 1. Should be `previous_version + 1`. TYPE: `int`
`fields_changed`	Names of fields that were modified in this update. TYPE: `list of str`
`trigger`	What triggered the update (e.g., `"feedback_accepted"`, `"source_refresh"`, `"conflict_resolved"`). TYPE: `str`

FeedbackEventDetails ¶

Bases: BaseModel

Details for FEEDBACK provenance events.

Records that a FeedbackCard was processed by the Attribution Engine and its corrections were either accepted or rejected.

ATTRIBUTE	DESCRIPTION
`type`	Discriminator field. Always `"feedback"`. TYPE: `Literal['feedback']`
`feedback_card_id`	UUID of the `FeedbackCard` that was processed. TYPE: `UUID`
`overall_assessment`	The reviewer's overall assessment score from the feedback card, range [0.0, 1.0]. TYPE: `float`
`corrections_count`	Number of corrections in the feedback card. Non-negative. TYPE: `int`
`accepted`	Whether the feedback was accepted and applied to the attribution record. TYPE: `bool`

ProvenanceEvent ¶

Bases: BaseModel

Single event in the attribution provenance audit trail.

Each ProvenanceEvent records one discrete action that contributed to or modified an attribution record. The chain of events forms an immutable audit trail enabling full transparency of how an attribution was constructed and refined.

ATTRIBUTE	DESCRIPTION
`event_type`	High-level event type (FETCH, RESOLVE, SCORE, REVIEW, UPDATE, FEEDBACK). TYPE: `ProvenanceEventTypeEnum`
`timestamp`	UTC timestamp when this event occurred. Must be timezone-aware. TYPE: `datetime`
`agent`	Identifier of the software agent or human that performed this action (e.g., `"etl-musicbrainz-v1.2"`, `"reviewer-jdoe"`). TYPE: `str`
`details`	Typed event details (discriminated union). The concrete type matches `event_type`. TYPE: `EventDetails`
`feedback_card_id`	UUID of the associated `FeedbackCard`, if this event was triggered by user feedback. TYPE: `UUID or None`
`step_uncertainty`	Uncertainty decomposition for this specific pipeline step, if available. TYPE: `StepUncertainty or None`
`citation_index`	1-based citation index for referencing this event in chat responses. None if not cited. Minimum 1 when set. TYPE: `int or None`

validate_timestamp `classmethod` ¶

validate_timestamp(v: datetime) -> datetime

Timestamp must be timezone-aware.

Source code in src/music_attribution/schemas/attribution.py

@field_validator("timestamp")
@classmethod
def validate_timestamp(cls, v: datetime) -> datetime:
    """Timestamp must be timezone-aware."""
    if v.tzinfo is None:
        msg = "timestamp must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

AttributionRecord ¶

Bases: BaseModel

Complete attribution record for a musical work or recording.

The AttributionRecord is the third boundary object (BO-3) in the five-pipeline architecture and the primary deliverable of the Attribution Engine. It is consumed by the API/MCP Server (for permission queries) and the Chat Interface (for user-facing attribution display).

Each record contains: (1) a list of credits with per-credit confidence, (2) conformal prediction sets providing coverage guarantees, (3) an immutable provenance chain, and (4) an optional uncertainty summary. See Teikari (2026), sections 5-6.

ATTRIBUTE	DESCRIPTION
`schema_version`	Semantic version of the AttributionRecord schema. Defaults to `"1.0.0"`. TYPE: `str`
`attribution_id`	Unique identifier for this attribution record. Auto-generated UUIDv4. TYPE: `UUID`
`work_entity_id`	UUID of the `ResolvedEntity` (work or recording) that this attribution describes. TYPE: `UUID`
`work_title`	Display title of the work. Populated for API/UI convenience. TYPE: `str`
`artist_name`	Display name of the primary artist. Populated for API/UI convenience. TYPE: `str`
`credits`	Attribution credits. Must contain at least one credit. Each credit links an entity to a role with confidence scoring. TYPE: `list of Credit`
`assurance_level`	Overall A0-A3 assurance level for this attribution record, determined by the weakest link in the evidence chain. TYPE: `AssuranceLevelEnum`
`confidence_score`	Overall calibrated confidence score, range [0.0, 1.0]. Aggregated from per-credit confidences and source agreement. TYPE: `float`
`conformal_set`	Conformal prediction set providing coverage guarantees on role assignments. TYPE: `ConformalSet`
`source_agreement`	Degree of agreement across data sources, range [0.0, 1.0]. 1.0 = all sources agree on all credits; 0.0 = total disagreement. TYPE: `float`
`provenance_chain`	Ordered list of provenance events forming the audit trail. Events are appended chronologically. TYPE: `list of ProvenanceEvent`
`uncertainty_summary`	Aggregated uncertainty decomposition across all pipeline steps. None if uncertainty tracking is not enabled. TYPE: `UncertaintyAwareProvenance or None`
`needs_review`	Flag indicating this record requires human review before being surfaced to end users. TYPE: `bool`
`review_priority`	Priority score for the review queue, range [0.0, 1.0]. Higher values = more urgent review needed. TYPE: `float`
`created_at`	UTC timestamp when this record was first created. Must be timezone-aware. TYPE: `datetime`
`updated_at`	UTC timestamp of the most recent update. Must be timezone-aware. Must be >= `created_at`. TYPE: `datetime`
`version`	Monotonically increasing version number. Minimum 1. Bumped on every update. TYPE: `int`

Examples:

>>> from datetime import datetime, UTC
>>> record = AttributionRecord(
...     work_entity_id=uuid.uuid4(),
...     work_title="Hide and Seek",
...     artist_name="Imogen Heap",
...     credits=[
...         Credit(
...             entity_id=uuid.uuid4(),
...             entity_name="Imogen Heap",
...             role=CreditRoleEnum.PERFORMER,
...             confidence=0.95,
...             sources=[SourceEnum.MUSICBRAINZ, SourceEnum.DISCOGS],
...             assurance_level=AssuranceLevelEnum.LEVEL_2,
...         ),
...     ],
...     assurance_level=AssuranceLevelEnum.LEVEL_2,
...     confidence_score=0.92,
...     conformal_set=ConformalSet(
...         coverage_level=0.90,
...         marginal_coverage=0.91,
...         calibration_error=0.01,
...         calibration_method="split_conformal",
...         calibration_set_size=500,
...     ),
...     source_agreement=0.88,
...     review_priority=0.1,
...     created_at=datetime.now(UTC),
...     updated_at=datetime.now(UTC),
...     version=1,
... )

See Also

ResolvedEntity : The preceding boundary object from Entity Resolution. FeedbackCard : Reverse-flow feedback for calibration updates.

validate_timestamps `classmethod` ¶

validate_timestamps(v: datetime) -> datetime

Timestamps must be timezone-aware.

Source code in src/music_attribution/schemas/attribution.py

@field_validator("created_at", "updated_at")
@classmethod
def validate_timestamps(cls, v: datetime) -> datetime:
    """Timestamps must be timezone-aware."""
    if v.tzinfo is None:
        msg = "Timestamps must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_updated_after_created ¶

validate_updated_after_created() -> AttributionRecord

updated_at must be >= created_at.

Source code in src/music_attribution/schemas/attribution.py

@model_validator(mode="after")
def validate_updated_after_created(self) -> AttributionRecord:
    """updated_at must be >= created_at."""
    if self.updated_at < self.created_at:
        msg = "updated_at must be >= created_at"
        raise ValueError(msg)
    return self

Permissions¶

permissions ¶

PermissionBundle boundary object schema (BO-5).

Machine-readable permission specification for MCP consent queries. Implements the Permission Patchbay from Teikari (2026), section 7.

The PermissionBundle enables AI platforms and other consumers to programmatically query whether specific uses of a musical work are permitted, under what conditions, and who authorised the permission. This is the MCP-native alternative to robots.txt for audio content.

See Also

music_attribution.schemas.enums : Permission-related enums. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 7 (Permission Patchbay).

PermissionCondition ¶

Bases: BaseModel

Optional condition attached to a permission entry.

Conditions qualify a permission with additional constraints. For example, a permission might only apply in certain territories or time periods.

ATTRIBUTE	DESCRIPTION
`condition_type`	Type of condition (e.g., `"territory"`, `"date_range"`, `"max_duration_seconds"`, `"non_commercial_only"`). TYPE: `str`
`value`	Condition value as a string. Format depends on `condition_type` (e.g., `"US,GB,DE"` for territory, `"2025-01-01/2026-12-31"` for date range). TYPE: `str`

Examples:

>>> cond = PermissionCondition(
...     condition_type="territory",
...     value="US,GB,DE",
... )

PermissionEntry ¶

Bases: BaseModel

A single permission with optional conditions and requirements.

Each entry specifies a permission type (what use), a value (allow, deny, ask, or conditional), and optional requirements. Conditional values (ALLOW_WITH_ATTRIBUTION, ALLOW_WITH_ROYALTY) require their respective fields to be populated.

ATTRIBUTE	DESCRIPTION
`permission_type`	What kind of use this permission governs (e.g., AI_TRAINING, STREAM, REMIX). TYPE: `PermissionTypeEnum`
`value`	The permission decision (ALLOW, DENY, ASK, ALLOW_WITH_ATTRIBUTION, ALLOW_WITH_ROYALTY). TYPE: `PermissionValueEnum`
`conditions`	Additional conditions qualifying this permission. TYPE: `list of PermissionCondition`
`royalty_rate`	Royalty rate as a decimal (e.g., `Decimal("0.015")` for 1.5%). Required when `value` is ALLOW_WITH_ROYALTY; must be > 0. TYPE: `Decimal or None`
`attribution_requirement`	Required attribution text or format. Required when `value` is ALLOW_WITH_ATTRIBUTION. TYPE: `str or None`
`territory`	ISO 3166-1 alpha-2 country codes where this permission applies. None means worldwide. TYPE: `list of str or None`

Examples:

>>> entry = PermissionEntry(
...     permission_type=PermissionTypeEnum.AI_TRAINING,
...     value=PermissionValueEnum.ALLOW_WITH_ROYALTY,
...     royalty_rate=Decimal("0.015"),
...     territory=["US", "GB"],
... )

validate_conditional_fields ¶

validate_conditional_fields() -> PermissionEntry

Validate fields required by specific permission values.

Source code in src/music_attribution/schemas/permissions.py

@model_validator(mode="after")
def validate_conditional_fields(self) -> PermissionEntry:
    """Validate fields required by specific permission values."""
    if self.value == PermissionValueEnum.ALLOW_WITH_ROYALTY and (
        self.royalty_rate is None or self.royalty_rate <= 0
    ):
        msg = "royalty_rate must be > 0 when value is ALLOW_WITH_ROYALTY"
        raise ValueError(msg)
    if self.value == PermissionValueEnum.ALLOW_WITH_ATTRIBUTION and self.attribution_requirement is None:
        msg = "attribution_requirement must be non-None when value is ALLOW_WITH_ATTRIBUTION"
        raise ValueError(msg)
    return self

DelegationEntry ¶

Bases: BaseModel

An entry in the permission delegation chain.

Models the chain of authority from the rights owner through intermediaries (manager, label, distributor). Each entry specifies what authority the entity has over the permissions.

ATTRIBUTE	DESCRIPTION
`entity_id`	UUID of the entity in the delegation chain (a `ResolvedEntity` of type ARTIST, LABEL, etc.). TYPE: `UUID`
`role`	The entity's role in the delegation chain (OWNER, MANAGER, LABEL, DISTRIBUTOR). TYPE: `DelegationRoleEnum`
`can_modify`	Whether this entity can modify permission entries. Typically True for OWNER and MANAGER, False for DISTRIBUTOR. TYPE: `bool`
`can_delegate`	Whether this entity can further delegate authority to another entity. TYPE: `bool`

Examples:

>>> entry = DelegationEntry(
...     entity_id=uuid.uuid4(),
...     role=DelegationRoleEnum.OWNER,
...     can_modify=True,
...     can_delegate=True,
... )

PermissionBundle ¶

Bases: BaseModel

Machine-readable permission specification (BO-5).

The PermissionBundle is the boundary object used by the API/MCP Server pipeline to answer permission queries from AI platforms and other consumers. It implements the Permission Patchbay design from Teikari (2026), section 7.

Each bundle specifies permissions at a given scope (catalog, release, recording, or work) with an effective date range, a delegation chain showing who authorised the permissions, and a default permission for unlisted permission types.

ATTRIBUTE	DESCRIPTION
`schema_version`	Semantic version of the PermissionBundle schema. Defaults to `"1.0.0"`. TYPE: `str`
`permission_id`	Unique identifier for this permission bundle. Auto-generated UUIDv4. TYPE: `UUID`
`entity_id`	UUID of the rights holder entity (artist, label, publisher). TYPE: `UUID`
`scope`	Granularity of this permission bundle (CATALOG, RELEASE, RECORDING, WORK). TYPE: `PermissionScopeEnum`
`scope_entity_id`	UUID of the specific entity this permission applies to. Must be None when scope is CATALOG; must be non-None otherwise. TYPE: `UUID or None`
`permissions`	List of individual permission entries. Must contain at least one. TYPE: `list of PermissionEntry`
`effective_from`	UTC timestamp from which this bundle is effective. Must be timezone-aware. TYPE: `datetime`
`effective_until`	UTC timestamp until which this bundle is effective. None means no expiry. Must be after `effective_from` when set. TYPE: `datetime or None`
`delegation_chain`	Chain of authority from the rights owner through intermediaries. TYPE: `list of DelegationEntry`
`default_permission`	Default response for permission types not explicitly listed in `permissions`. Typically ASK or DENY. TYPE: `PermissionValueEnum`
`created_by`	UUID of the entity that created this permission bundle. TYPE: `UUID`
`updated_at`	UTC timestamp of the most recent update. Must be timezone-aware. TYPE: `datetime`
`version`	Monotonically increasing version number. Minimum 1. TYPE: `int`

Examples:

>>> from datetime import datetime, UTC
>>> from decimal import Decimal
>>> bundle = PermissionBundle(
...     entity_id=uuid.uuid4(),
...     scope=PermissionScopeEnum.CATALOG,
...     permissions=[
...         PermissionEntry(
...             permission_type=PermissionTypeEnum.AI_TRAINING,
...             value=PermissionValueEnum.DENY,
...         ),
...     ],
...     effective_from=datetime.now(UTC),
...     default_permission=PermissionValueEnum.ASK,
...     created_by=uuid.uuid4(),
...     updated_at=datetime.now(UTC),
...     version=1,
... )

See Also

PermissionEntry : Individual permission with conditions. DelegationEntry : Authority chain for permission provenance.

validate_timestamps `classmethod` ¶

validate_timestamps(v: datetime) -> datetime

Timestamps must be timezone-aware.

Source code in src/music_attribution/schemas/permissions.py

@field_validator("effective_from", "updated_at")
@classmethod
def validate_timestamps(cls, v: datetime) -> datetime:
    """Timestamps must be timezone-aware."""
    if v.tzinfo is None:
        msg = "Timestamps must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_effective_until `classmethod` ¶

validate_effective_until(
    v: datetime | None,
) -> datetime | None

effective_until must be timezone-aware when set.

Source code in src/music_attribution/schemas/permissions.py

@field_validator("effective_until")
@classmethod
def validate_effective_until(cls, v: datetime | None) -> datetime | None:
    """effective_until must be timezone-aware when set."""
    if v is not None and v.tzinfo is None:
        msg = "effective_until must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_scope_consistency ¶

validate_scope_consistency() -> PermissionBundle

scope_entity_id must be None for CATALOG, required for others.

Source code in src/music_attribution/schemas/permissions.py

@model_validator(mode="after")
def validate_scope_consistency(self) -> PermissionBundle:
    """scope_entity_id must be None for CATALOG, required for others."""
    if self.scope == PermissionScopeEnum.CATALOG and self.scope_entity_id is not None:
        msg = "scope_entity_id must be None when scope is CATALOG"
        raise ValueError(msg)
    if self.scope != PermissionScopeEnum.CATALOG and self.scope_entity_id is None:
        msg = f"scope_entity_id must be non-None when scope is {self.scope}"
        raise ValueError(msg)
    return self

validate_effective_range ¶

validate_effective_range() -> PermissionBundle

effective_from must be before effective_until.

Source code in src/music_attribution/schemas/permissions.py

@model_validator(mode="after")
def validate_effective_range(self) -> PermissionBundle:
    """effective_from must be before effective_until."""
    if self.effective_until is not None and self.effective_from >= self.effective_until:
        msg = "effective_from must be before effective_until"
        raise ValueError(msg)
    return self

Feedback¶

feedback ¶

FeedbackCard boundary object schema (BO-4).

Structured feedback from domain experts (artists, managers, musicologists, producers). Flows from the Chat Interface back into the Attribution Engine for calibration updates. Ref: Zhou et al., 2023 -- FeedbackCards.

The FeedbackCard is the primary reverse-flow boundary object in the five-pipeline architecture, enabling human-in-the-loop calibration. When a domain expert reviews an attribution and provides corrections, these are captured in a FeedbackCard that feeds back into the Attribution Engine for confidence recalibration.

See Also

music_attribution.schemas.attribution : The AttributionRecord being reviewed. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 6.

Correction ¶

Bases: BaseModel

A specific correction to an attribution field.

Represents a single field-level correction proposed by a reviewer. Each correction includes the current (incorrect) value, the proposed corrected value, and the reviewer's confidence in their correction.

ATTRIBUTE	DESCRIPTION
`field`	Name of the field being corrected (e.g., `"role"`, `"entity_name"`, `"confidence"`). TYPE: `str`
`current_value`	The current (incorrect) value of the field as a string. TYPE: `str`
`corrected_value`	The proposed correct value as a string. TYPE: `str`
`entity_id`	UUID of the specific entity this correction applies to, if the correction is entity-specific (e.g., correcting a credit role). None for record-level corrections. TYPE: `UUID or None`
`confidence_in_correction`	The reviewer's confidence that their correction is accurate, range [0.0, 1.0]. Higher values from authoritative reviewers (e.g., the artist) carry more weight during recalibration. TYPE: `float`
`evidence`	Free-text description of the evidence supporting this correction (e.g., `"Listed in vinyl liner notes, track 3"`). TYPE: `str or None`

Examples:

>>> correction = Correction(
...     field="role",
...     current_value="PERFORMER",
...     corrected_value="PRODUCER",
...     entity_id=uuid.uuid4(),
...     confidence_in_correction=0.95,
...     evidence="Confirmed in studio session notes",
... )

FeedbackCard ¶

Bases: BaseModel

Structured feedback from a domain expert (BO-4).

The FeedbackCard is the reverse-flow boundary object in the five-pipeline architecture, flowing from the Chat Interface back into the Attribution Engine. It captures structured corrections and an overall assessment from a domain expert who reviewed an AttributionRecord.

A valid FeedbackCard must contain either corrections or free-text (or both). The center_bias_flag is automatically set when the overall assessment falls in the [0.45, 0.55] range, indicating potential anchoring bias toward the midpoint.

ATTRIBUTE	DESCRIPTION
`schema_version`	Semantic version of the FeedbackCard schema. Defaults to `"1.0.0"`. TYPE: `str`
`feedback_id`	Unique identifier for this feedback card. Auto-generated UUIDv4. TYPE: `UUID`
`attribution_id`	UUID of the `AttributionRecord` being reviewed. TYPE: `UUID`
`reviewer_id`	Identifier of the reviewer (may be an email, username, or external ID). TYPE: `str`
`reviewer_role`	Domain expertise of the reviewer (ARTIST, MANAGER, MUSICOLOGIST, PRODUCER, FAN). TYPE: `ReviewerRoleEnum`
`attribution_version`	Version of the `AttributionRecord` at the time of review. Minimum 1. Prevents stale feedback on updated records. TYPE: `int`
`corrections`	Specific field-level corrections proposed by the reviewer. May be empty if only free-text feedback is provided. TYPE: `list of Correction`
`overall_assessment`	Reviewer's overall assessment of the attribution quality, range [0.0, 1.0]. 0.0 = completely wrong; 1.0 = perfect. TYPE: `float`
`center_bias_flag`	Automatically set to True if `overall_assessment` is in [0.45, 0.55], indicating potential anchoring bias. TYPE: `bool`
`free_text`	Free-text feedback for nuances not captured by structured corrections. TYPE: `str or None`
`evidence_type`	Type of evidence supporting the feedback (LINER_NOTES, MEMORY, DOCUMENT, SESSION_NOTES, OTHER). TYPE: `EvidenceTypeEnum`
`submitted_at`	UTC timestamp when the feedback was submitted. Must be timezone-aware. TYPE: `datetime`

Examples:

>>> from datetime import datetime, UTC
>>> card = FeedbackCard(
...     attribution_id=uuid.uuid4(),
...     reviewer_id="imogen.heap@example.com",
...     reviewer_role=ReviewerRoleEnum.ARTIST,
...     attribution_version=1,
...     corrections=[
...         Correction(
...             field="role",
...             current_value="PERFORMER",
...             corrected_value="SONGWRITER",
...             confidence_in_correction=1.0,
...         ),
...     ],
...     overall_assessment=0.7,
...     evidence_type=EvidenceTypeEnum.MEMORY,
...     submitted_at=datetime.now(UTC),
... )

See Also

AttributionRecord : The record being reviewed. Correction : Individual field-level corrections.

validate_submitted_at `classmethod` ¶

validate_submitted_at(v: datetime) -> datetime

submitted_at must be timezone-aware.

Source code in src/music_attribution/schemas/feedback.py

@field_validator("submitted_at")
@classmethod
def validate_submitted_at(cls, v: datetime) -> datetime:
    """submitted_at must be timezone-aware."""
    if v.tzinfo is None:
        msg = "submitted_at must be timezone-aware (UTC)"
        raise ValueError(msg)
    return v

validate_content_not_empty ¶

validate_content_not_empty() -> FeedbackCard

A feedback card must have corrections or free_text.

Source code in src/music_attribution/schemas/feedback.py

@model_validator(mode="after")
def validate_content_not_empty(self) -> FeedbackCard:
    """A feedback card must have corrections or free_text."""
    if not self.corrections and self.free_text is None:
        msg = "FeedbackCard must have non-empty corrections or non-None free_text"
        raise ValueError(msg)
    return self

validate_center_bias ¶

validate_center_bias() -> FeedbackCard

Set center_bias_flag if overall_assessment is in [CENTER_BIAS_LOW, CENTER_BIAS_HIGH].

Source code in src/music_attribution/schemas/feedback.py

@model_validator(mode="after")
def validate_center_bias(self) -> FeedbackCard:
    """Set center_bias_flag if overall_assessment is in [CENTER_BIAS_LOW, CENTER_BIAS_HIGH]."""
    if CENTER_BIAS_LOW <= self.overall_assessment <= CENTER_BIAS_HIGH:
        object.__setattr__(self, "center_bias_flag", True)
    return self

Uncertainty¶

uncertainty ¶

Uncertainty-aware provenance schema models.

Provides decomposed uncertainty tracking for every step of the attribution pipeline. These models are attached to ProvenanceEvent and AttributionRecord objects, enabling transparent communication of why a confidence score is what it is, not just what it is.

Academic grounding:

UProp (Duan 2025, arXiv:2506.17419) -- intrinsic/extrinsic decomposition of uncertainty propagation across pipeline steps.
Liu (2025, arXiv:2503.15850) -- 4-dimensional uncertainty framework (input, reasoning, parameter, prediction).
Yanez (2025, Patterns) -- confidence-weighted source integration for multi-source fusion.
Tian (2025, arXiv:2508.06225) -- TH-Score for overconfidence detection in LLM-based systems.
Tripathi (2025, arXiv:2506.23464) -- H-Score and Expected Calibration Improvement (ECI) metrics.
Zhang (2026, arXiv:2601.15778) -- trajectory-level Holistic Trajectory Calibration (HTC).

See Also

music_attribution.schemas.attribution : Uses these models in provenance. Teikari, P. (2026). Music Attribution with Transparent Confidence, section 5 (uncertainty framework).

StepUncertainty ¶

Bases: BaseModel

Per-step uncertainty decomposition (UProp, Duan 2025).

Tracks intrinsic (data noise) and extrinsic (model/pipeline) uncertainty for each processing step in the attribution pipeline, plus an optional 4-dimensional decomposition (Liu 2025). This enables fine-grained analysis of where uncertainty enters and accumulates.

The total_uncertainty must be >= intrinsic_uncertainty (validated at runtime), since total includes both intrinsic and extrinsic components.

ATTRIBUTE	DESCRIPTION
`step_id`	Unique identifier for this pipeline step (e.g., `"etl-musicbrainz"`, `"resolution-fuzzy"`). TYPE: `str`
`step_name`	Human-readable name of the pipeline step (e.g., `"MusicBrainz ETL"`, `"Fuzzy String Resolution"`). TYPE: `str`
`step_index`	Zero-based position of this step in the pipeline sequence. TYPE: `int`
`stated_confidence`	Raw confidence before calibration, range [0.0, 1.0]. TYPE: `float`
`calibrated_confidence`	Confidence after post-hoc calibration, range [0.0, 1.0]. May differ significantly from `stated_confidence` if the step exhibits systematic over- or under-confidence. TYPE: `float`
`intrinsic_uncertainty`	Uncertainty from the input data itself (noise, conflicts, missing fields), range [0.0, 1.0]. TYPE: `float`
`extrinsic_uncertainty`	Uncertainty from the model/algorithm (embedding limitations, threshold sensitivity), range [0.0, 1.0]. TYPE: `float`
`total_uncertainty`	Combined uncertainty, range [0.0, 1.0]. Must be >= `intrinsic_uncertainty`. TYPE: `float`
`input_uncertainty`	4-D decomposition: input dimension (Liu 2025), range [0.0, 1.0]. None if 4-D decomposition not computed. TYPE: `float or None`
`reasoning_uncertainty`	4-D decomposition: reasoning dimension, range [0.0, 1.0]. TYPE: `float or None`
`parameter_uncertainty`	4-D decomposition: parameter dimension, range [0.0, 1.0]. TYPE: `float or None`
`prediction_uncertainty`	4-D decomposition: prediction dimension, range [0.0, 1.0]. TYPE: `float or None`
`uncertainty_sources`	Classification of uncertainty sources active in this step (INTRINSIC, EXTRINSIC, ALEATORIC, EPISTEMIC). TYPE: `list of UncertaintySourceEnum`
`confidence_method`	Method used to produce the confidence estimate for this step. TYPE: `ConfidenceMethodEnum`
`preceding_step_ids`	IDs of pipeline steps that feed into this step. Used for UProp uncertainty propagation tracking. TYPE: `list of str`

Examples:

>>> step = StepUncertainty(
...     step_id="etl-musicbrainz",
...     step_name="MusicBrainz ETL",
...     step_index=0,
...     stated_confidence=0.87,
...     calibrated_confidence=0.82,
...     intrinsic_uncertainty=0.10,
...     extrinsic_uncertainty=0.05,
...     total_uncertainty=0.15,
...     confidence_method=ConfidenceMethodEnum.SELF_REPORT,
... )

validate_total_ge_intrinsic ¶

validate_total_ge_intrinsic() -> StepUncertainty

total_uncertainty must be >= intrinsic_uncertainty.

Source code in src/music_attribution/schemas/uncertainty.py

@model_validator(mode="after")
def validate_total_ge_intrinsic(self) -> StepUncertainty:
    """total_uncertainty must be >= intrinsic_uncertainty."""
    if self.total_uncertainty < self.intrinsic_uncertainty:
        msg = "total_uncertainty must be >= intrinsic_uncertainty"
        raise ValueError(msg)
    return self

SourceContribution ¶

Bases: BaseModel

Per-source confidence with calibration quality (Yanez 2025).

Tracks how much each data source contributed to the final attribution, with calibration quality indicating how reliable that source's confidence estimates historically are. Sources with higher calibration_quality receive higher weights in the aggregation.

ATTRIBUTE	DESCRIPTION
`source`	The data source (e.g., MUSICBRAINZ, DISCOGS, ARTIST_INPUT). TYPE: `SourceEnum`
`confidence`	This source's confidence in its contribution, range [0.0, 1.0]. TYPE: `float`
`weight`	Normalised weight of this source in the final aggregation, range [0.0, 1.0]. Weights across all sources sum to 1.0. TYPE: `float`
`calibration_quality`	Historical calibration quality of this source's confidence estimates, range [0.0, 1.0]. 1.0 = perfectly calibrated (stated confidence matches empirical accuracy). TYPE: `float`
`record_count`	Number of records this source contributed to the attribution. Non-negative. TYPE: `int`
`is_human`	Whether this source is human-provided (e.g., ARTIST_INPUT). Human sources may receive preferential weighting for subjective fields. TYPE: `bool`

Examples:

>>> contrib = SourceContribution(
...     source=SourceEnum.MUSICBRAINZ,
...     confidence=0.90,
...     weight=0.45,
...     calibration_quality=0.85,
...     record_count=3,
... )

CalibrationMetadata ¶

Bases: BaseModel

Per-step calibration metrics (Tian 2025 TH-Score).

Records calibration quality for confidence scores, including expected calibration error (ECE), calibration set size, and the method used. Lower ECE indicates better calibration (stated confidence matches empirical accuracy).

ATTRIBUTE	DESCRIPTION
`expected_calibration_error`	Expected Calibration Error (ECE), the average absolute difference between confidence and accuracy across bins. Non-negative. Lower is better; 0.0 = perfectly calibrated. TYPE: `float`
`calibration_set_size`	Number of examples used for calibration. Larger sets give more reliable ECE estimates. Non-negative. TYPE: `int`
`status`	Current calibration status (CALIBRATED, UNCALIBRATED, PENDING). TYPE: `CalibrationStatusEnum`
`method`	Name of the calibration method used (e.g., `"platt_scaling"`, `"isotonic_regression"`, `"temperature_scaling"`). None if uncalibrated. TYPE: `str or None`

Examples:

>>> cal = CalibrationMetadata(
...     expected_calibration_error=0.03,
...     calibration_set_size=500,
...     status=CalibrationStatusEnum.CALIBRATED,
...     method="platt_scaling",
... )

OverconfidenceReport ¶

Bases: BaseModel

Overconfidence detection report (Tripathi 2025 H-Score, ECI).

Detects when stated confidence exceeds actual accuracy, a common failure mode in LLM-based systems. The overconfidence_gap is the primary diagnostic: positive = overconfident, negative = underconfident, zero = perfectly calibrated.

ATTRIBUTE	DESCRIPTION
`stated_confidence`	The system's stated confidence, range [0.0, 1.0]. TYPE: `float`
`actual_accuracy`	Empirically measured accuracy on a validation set, range [0.0, 1.0]. TYPE: `float`
`overconfidence_gap`	`stated_confidence - actual_accuracy`. Positive values indicate overconfidence; negative values indicate underconfidence. Can range from -1.0 to 1.0. TYPE: `float`
`th_score`	TH-Score from Tian (2025). Measures hallucination tendency. None if not computed. TYPE: `float or None`
`h_score`	H-Score from Tripathi (2025). Measures honesty of confidence estimates. None if not computed. TYPE: `float or None`
`eci`	Expected Calibration Improvement (ECI) from Tripathi (2025). How much calibration could be improved. None if not computed. TYPE: `float or None`

Examples:

>>> report = OverconfidenceReport(
...     stated_confidence=0.92,
...     actual_accuracy=0.85,
...     overconfidence_gap=0.07,
...     th_score=0.12,
... )

TrajectoryCalibration ¶

Bases: BaseModel

Trajectory-level calibration (Zhang 2026, HTC).

Tracks confidence dynamics across the full pipeline, treating the sequence of confidence scores at each step as a trajectory. The trajectory shape (increasing, decreasing, stable, volatile) is a powerful signal for calibration: volatile trajectories often indicate unreliable final confidence.

The optional htc_feature_vector is a 48-dimensional feature vector extracted from the trajectory for use with the HTC calibration method.

ATTRIBUTE	DESCRIPTION
`trajectory_id`	Unique identifier for this trajectory (typically matches the attribution record ID). TYPE: `str`
`step_count`	Number of pipeline steps in the trajectory. Minimum 1. TYPE: `int`
`confidence_trend`	Classified trend of confidence across steps (INCREASING, DECREASING, STABLE, VOLATILE). TYPE: `ConfidenceTrendEnum`
`initial_confidence`	Confidence at the first pipeline step, range [0.0, 1.0]. TYPE: `float`
`final_confidence`	Confidence at the last pipeline step, range [0.0, 1.0]. TYPE: `float`
`htc_feature_vector`	48-dimensional feature vector for HTC calibration (Zhang 2026). Must be exactly length 48 when provided. None if HTC is not used. TYPE: `list of float or None`

Examples:

>>> traj = TrajectoryCalibration(
...     trajectory_id="attr-12345",
...     step_count=4,
...     confidence_trend=ConfidenceTrendEnum.INCREASING,
...     initial_confidence=0.65,
...     final_confidence=0.92,
... )

validate_htc_vector_length `classmethod` ¶

validate_htc_vector_length(
    v: list[float] | None,
) -> list[float] | None

HTC feature vector must be length 48 when provided.

Source code in src/music_attribution/schemas/uncertainty.py

@field_validator("htc_feature_vector")
@classmethod
def validate_htc_vector_length(cls, v: list[float] | None) -> list[float] | None:
    """HTC feature vector must be length 48 when provided."""
    if v is not None and len(v) != 48:
        msg = "htc_feature_vector must be length 48"
        raise ValueError(msg)
    return v

UncertaintyAwareProvenance ¶

Bases: BaseModel

Top-level uncertainty summary for an AttributionRecord.

Aggregates step-level uncertainties, source contributions, calibration metadata, overconfidence diagnostics, and trajectory calibration into a single summary. Attached to each AttributionRecord as uncertainty_summary.

This is the primary structure for answering "why is the confidence what it is?" -- enabling transparent uncertainty communication to end users and downstream systems.

ATTRIBUTE	DESCRIPTION
`steps`	Per-step uncertainty decomposition for each pipeline step. Ordered by `step_index`. TYPE: `list of StepUncertainty`
`source_contributions`	Per-source confidence and weight breakdown. Shows how much each data source influenced the final score. TYPE: `list of SourceContribution`
`calibration`	Overall calibration metrics for the record's confidence score. None if calibration has not been performed. TYPE: `CalibrationMetadata or None`
`overconfidence`	Overconfidence diagnostic report. None if not computed. TYPE: `OverconfidenceReport or None`
`trajectory`	Trajectory-level calibration data (HTC). None if trajectory analysis was not performed. TYPE: `TrajectoryCalibration or None`
`total_uncertainty`	Aggregated total uncertainty across all steps, range [0.0, 1.0]. Defaults to 0.0. TYPE: `float`
`dominant_uncertainty_source`	The primary source of uncertainty in this record (INTRINSIC, EXTRINSIC, ALEATORIC, or EPISTEMIC). None if not determined. TYPE: `UncertaintySourceEnum or None`

Examples:

>>> summary = UncertaintyAwareProvenance(
...     total_uncertainty=0.18,
...     dominant_uncertainty_source=UncertaintySourceEnum.EPISTEMIC,
... )

See Also

AttributionRecord : Parent record containing this summary. StepUncertainty : Per-step decomposition detail.

Schemas¶

Enums¶

enums ¶

SourceEnum ¶

EntityTypeEnum ¶

RelationshipTypeEnum ¶

ResolutionMethodEnum ¶

AssuranceLevelEnum ¶

ConflictSeverityEnum ¶

CreditRoleEnum ¶

ProvenanceEventTypeEnum ¶

ReviewerRoleEnum ¶

EvidenceTypeEnum ¶

PermissionTypeEnum ¶

PermissionValueEnum ¶

PermissionScopeEnum ¶

DelegationRoleEnum ¶

PipelineFeedbackTypeEnum ¶

UncertaintySourceEnum ¶

UncertaintyDimensionEnum ¶

ConfidenceMethodEnum ¶

CalibrationStatusEnum ¶

ConfidenceTrendEnum ¶

AttributionMethodEnum ¶

RightsTypeEnum ¶

MediaTypeEnum ¶

CertificationTypeEnum ¶

WatermarkTypeEnum ¶

RevenueModelEnum ¶

RegulatoryFrameworkEnum ¶

ComplianceActorEnum ¶

TdmReservationMethodEnum ¶

Normalized Record (ETL Output)¶

normalized ¶

IdentifierBundle ¶

has_any ¶

SourceMetadata ¶

Relationship ¶

NormalizedRecord ¶

validate_canonical_name classmethod ¶

validate_fetch_timestamp classmethod ¶

validate_identifiers_for_machine_sources ¶

Resolved Entity (Resolution Output)¶

resolved ¶

SourceReference ¶

ResolutionDetails ¶

ResolvedRelationship ¶

Conflict ¶

ResolvedEntity ¶

validate_resolved_at classmethod ¶

validate_review_fields ¶

Attribution Record (Engine Output)¶

attribution ¶

EventDetails module-attribute ¶

Credit ¶

ConformalSet ¶

FetchEventDetails ¶

ResolveEventDetails ¶

ScoreEventDetails ¶

ReviewEventDetails ¶

UpdateEventDetails ¶

FeedbackEventDetails ¶

ProvenanceEvent ¶

validate_timestamp classmethod ¶

AttributionRecord ¶

validate_timestamps classmethod ¶

validate_updated_after_created ¶

Permissions¶

permissions ¶

PermissionCondition ¶

PermissionEntry ¶

validate_conditional_fields ¶

DelegationEntry ¶

PermissionBundle ¶

validate_timestamps classmethod ¶

validate_effective_until classmethod ¶

validate_scope_consistency ¶

validate_effective_range ¶

Feedback¶

feedback ¶

validate_canonical_name `classmethod` ¶

validate_fetch_timestamp `classmethod` ¶

validate_resolved_at `classmethod` ¶

EventDetails `module-attribute` ¶

validate_timestamp `classmethod` ¶

validate_timestamps `classmethod` ¶

validate_timestamps `classmethod` ¶

validate_effective_until `classmethod` ¶

validate_submitted_at `classmethod` ¶

validate_htc_vector_length `classmethod` ¶