◀︎ Methodology Contents

This page explains how I used 2019/20 season analysis data from the Institute for Composer Diversity (ICD) in my posts. You can find more information about the ICD analysis on the page summarizing different research on orchestra seasons.

Although the Institute is part of the State University of New York at Fredonia, it’s currently being run on an entirely volunteer basis. It was a lot of work for me to go through over 600 playlist tracks, so I can only imagine how much effort was involved in putting together the ICD 2019/20 orchestra season dataset, which has over 4,000 pieces played by 120 orchestras. And that’s only one of their smaller projects. I donated to support their work, and you should donate to them too, if you are able.

Reworking the Season Analysis Data

In order to use the ICD’s data, I transformed their source spreadsheet into a single table. You can view the Jupyter notebook used to create the table from the ICD Season data here. The data is all pulled from the original spreadsheet linked from the ICD website.

The purpose for doing this data transformation was to:

Summary Statistics

The summary tabs in the original spreadsheet includes two “Combined Percentages” columns named “W&H” and “W&H&L” (W stands for Women; H stands for composers from underrepresented racial, ethnic, or cultural heritages; L stands for Living).

However, these are an average of individual columns for each attribute, not the percentage/count for works that meet all two or three conditions. With the transformed data, I was able to generate statistics based on the latter.

In addition, the overall averages on the FULL 120 STATS tab of the original spreadsheet are based on an average of averages, whereas the numbers I used in playlist analysis series are based on a direct average from the whole dataset. These two methods do give slightly different results, but in this case, the difference is pretty minor (the whole dataset average that I used is a couple of 0.1% points lower).

Minor Errors

  • In the original sheet, the ranges used for some of the SUM formulas on individual orchestra tabs sometimes excluded the last row, meaning a small number of records were not counted for the summary tabs. I have counted them in the numbers presented in the posts.

  • In the original sheet, there are a handful of entries that have data entry errors - they are missing a “1” in the Living or Heritages fields. (You can tell because the sum for the demographic field is less than the number of times the composer’s name appears). I have corrected these undercounts in the posts (e.g. for the full 120 group, the number of composers from underrepresented Heritages is undercounted by six, and the number of living composers is undercounted by nine) 1There’s one composer that passed away during the middle of the concert season. For some pieces, they are listed as living, and on others, as not. I assumed that this was probably correct, and not a data entry error, though the ICD data does not include original performance dates, so I was unable to verify this assumption..

Calculating “Average US Orchestra” Comparator

In the deep dive posts for Apple Music and Spotify classical playlists, I created a comparison playlist based on the demographic percentages from the full group of 120 orchestras analyzed by the ICD.

The calculations are pretty simple, e.g. for the Apple Music Classical Essentials playlist, which has 101 tracks, the number of women composers on the “If Average US Orchestra” comparison playlist would be as follows:

101 tracks × 7.9% women composers in ICD analysis = 7.979 → 8 women composers