Paper at ISMIR 2023 on the Ethics of AI Training Sets

A paper that I co-authored with Megha Sharma and I-Chieh Wei (Sam) has been accepted for ISMIR 2023. Looking forward to presenting it in my home country it November!

Data Collection in Music Generation Training Sets: A Critical Analysis
Fabio Morreale, Megha Sharma, I-Chieh Wei

The practices of data collection in training sets for Automatic Music Generation (AMG) tasks are opaque and overlooked. In this paper, we aimed to identify these practices and surface the values they embed. We systematically identified all datasets used to train AMG models presented at the last ten editions of ISMIR. For each dataset, we checked how it was populated and the extent to which musicians wittingly contributed to its creation.\ Almost half of the datasets (42.6%) were indiscriminately populated by accumulating music data available online without seeking any sort of permission. We discuss the ideologies that underlie this practice and propose a number of suggestions AMG dataset creators might follow. Overall, this paper contributes to the emerging self-critical corpus of work of the ISMIR community, reflecting on the ethical considerations and the social responsibility of our work.