(Hypebot) — Millions in music royalties are getting lost or payments delayed, and the culprit is often as simple as duplicate data. Find out how messy metadata is quietly choking the music industry and what it will take to finally clean it up.
Why Duplicate Data is Costing Musicians and the Music Industry Millions
By Jacob Varghese, Founder and Director at Noctil
“It’s a problem that affects everyone”
In the complex world of music rights management, a quiet but persistent problem is draining resources and delaying payments: duplicate data. This issue, often hidden within vast databases, creates significant operational inefficiencies and financial challenges. It’s a problem that affects everyone, from major labels and publishers to individual artists and songwriters, ultimately impacting the flow of money throughout the music ecosystem.
The core of the problem lies in redundant metadata. Metadata is the descriptive information about a piece of music: details like artist names, song titles, composers, producers, publishers, and essential identifiers such as ISRCs (International Standard Recording Codes) and ISWCs (International Standard Musical Work Codes). When this information is duplicated across different systems or entered inconsistently, it creates a tangled web that is difficult to untangle.
Excessive manual work
One of the most immediate consequences of duplicate data is the surge in manual work. Rights organisations, collecting societies, and even individual artist teams are forced to dedicate significant time and resources to identifying and resolving duplicate entries. Imagine a database with multiple entries for the same song or artist, perhaps with slight variations in spelling, different identifiers, or incomplete information. Each duplicate requires human intervention to compare, verify, and merge, a process that is both time-consuming and prone to further error. This diverts valuable staff away from more productive tasks, such as business development, artist relations, or strategic planning. Instead, they are stuck in a cycle of data cleaning, a task that offers little to no direct return on investment. The reliance on spreadsheets for managing large volumes of data also introduces human error, making the problem worse.
Delayed royalty distributions
The direct link between messy, duplicated data and delayed royalty distributions is a major concern. When usage data comes in from various digital providers, broadcasters, and other sources, it needs to be de-duplicated and matched accurately with the correct musical works and sound recordings, and then linked to their respective rights holders. Duplicates result in multiple matches or candidate lists, which require manual intervention and are incredibly difficult.
“In an industry where cash flow is vital for creators, these hold-ups can be frustrating and financially damaging.”
If a song has multiple, slightly different entries in a database, the system may fail to recognise that a single usage report refers to the same work or sound recording. This leads to discrepancies, requiring manual reconciliation. These delays mean that artists, songwriters, publishers, and labels receive their due payments much later than they should. In an industry where cash flow is vital for creators, these hold-ups can be frustrating and financially damaging.
The “Black Box” effect: Millions of dollars annually
Duplicate information is a significant contributor to the problem of “black box” royalties. These are revenues generated from music usage that cannot be accurately matched to a specific rights holder, often due to incomplete, inaccurate, or conflicting metadata. When systems cannot confidently identify the rightful recipient of royalties because of duplicated or ambiguous data, that money can end up in unallocated accounts.
Estimates for these unallocated royalties vary, but they represent millions of pounds or dollars annually. For artists and rights holders, this creates a lack of transparency; their money enters a “mysterious void” where it is difficult to trace. This absence of clear visibility undermines trust within the industry and makes it challenging for creators to understand how their work is performing and how they are being compensated. The “black box” is a symptom of a fragmented information landscape where different industry segments focus on distinct types of rights and data, leading to varied data requirements and use cases that often don’t align.
Proactive prevention: Taming the data beast
Addressing duplicate data requires a two-pronged approach: identifying existing issues and, crucially, preventing new duplicates from emerging. The good news is that solutions and best practices exist to tackle this problem.
- Prioritise data quality at the source: The most effective way to prevent duplicates is to ensure data is accurate and complete from the moment it is first entered. This means establishing clear data entry guidelines, educating teams, and using standardised templates. Rights holders must prioritise the accurate and complete submission of metadata, including essential identifiers like ISRCs, ISWCs, and IPI numbers.
- Implement robust data governance: Organisations need to establish clear policies and procedures for how data is collected, stored, managed, and shared. This includes defining data ownership, access permissions, and a single source of truth for critical information. A well-defined data strategy is not just about numbers; it’s about creating a compelling narrative.
- Use data validation and cleansing tools: Technology can play a significant role here. Automated data validation tools can flag inconsistencies or potential duplicates at the point of entry. For existing datasets, data cleansing tools can help identify and merge duplicate records. This might involve setting up automated rules for matching and merging, or using machine learning to identify patterns that indicate duplicates.
- Standardise identifiers: The music industry suffers from a lack of universal standards for metadata, meaning information often gets entered incorrectly as it moves between databases. Advocating for and adopting universal metadata standards (e.g., consistent use of ISRCs, ISWCs, and IPIs) across the industry is vital. This ensures compatibility and seamless data exchange between different systems and stakeholders.
- Foster collaboration: The problem of duplicate data is often a result of fragmented systems and a lack of communication between different parts of the industry. Encouraging greater collaboration between labels, publishers, distributors, collecting societies, and digital platforms can lead to shared best practices and more harmonised data exchange. Organisations like The MLC in the USA, with its “Supplemental Matching Network,” are good examples of collaborative efforts to improve data matching.
By focusing on data quality from the outset, implementing strong governance, and leveraging technology to identify and prevent duplicates, the music industry can significantly reduce manual work, accelerate royalty distributions, and enhance transparency. This shift from reactive problem-solving to proactive prevention is essential for a more efficient and equitable future.