My data visualization project represents traditional Irish tunes visually to highlight their similarities. The idea for the project came from something my mother says to me regularly when I’m home visiting and playing the fiddle: “How do you keep all those tunes straight? They all sound the same to me”.
In its final form, each tune is represented as a node in a graph and is connected to other tunes to which it is similar. Nodes are sized according to their weighted degree—i.e. the more overlap one tune has to many other tunes, the larger it will be. It becomes obvious very quickly that there are a handful of jigs that have a lot of overlap with other tunes.
Viewers can zoom in and out using the '+' and '-' keys, translate around the graph by dragging the canvas with their mouse, and highlight connected tunes by hovering over a node. This allows viewers to see the neighborhoods of tunes and discover tunes that bridge between the larger nodes.
For the final representation, I normalized the levenshtein distance by dividing it by the length of the longest tune in the pair. This normalized measure of similarity then served as the edge weights in the graph. I also used these normalized similarities to filter out edges between tunes that did not share many common phrases—if the normalized distance was lower than 0.3, the edge was omitted from the graph.
I created several visualizations of different features of the data set to find the best way to navigate the data set and illustrate the similarities and differences between tunes. Some of these were built using Processing and some using matplotlib.
In this exploration, I compared the two jigs with the lowest Levenshtein distance. The x-axis represents time (rhythm) and the y-axis is pitch. Each rectangle represents a note. The width of the rectangle represents the duration of the note, and its vertical position represents the pitch of the note. The two tunes are drawn with semi-transparent fills, one in red, the other in blue. Where the notes overlap, the rectangles become a shade of purple.
In this exploration I plotted the pitch of the note against its length for the jigs in the data set. Each point is sized according its frequency. Unlike the previous exploration, notes are not plotted in time, but by the length of the note. It's clear from this plot that most of the notes that occur in jigs are eigth notes ("5" on the x-axis) and there are about 10–12 common pitches.
This is interesting when compared to a Beethoven quartet, where the points are much more equally sized, indicating a greater variety in the notes.
This exploration began as a way for me to identify smaller groups of jigs with which to test some of my visualizations. I wanted to see what keys had a small number of jigs so that I could use those subsets of the data. I created a stacked bar graph and placed it around the circle of fifths. Each layer in a given stack corresponds to different modes. It's clear in this visualization that 1 sharp (G major, E minor, and D Mixolydian) and 2 sharps (D major, and B minor) are the most common keys.
I initially tried several ways of visualizing the melodies to highlight how few differences there were between pairs of tunes.
I started by representing each note as a rectangle. Each row represents a tune. When a note is identical between that tune and the reference tune, the note is drawn. If the notes are different, that space is left blank. This gives you a picture of where the tunes are similar, but now how they differ.
My second attempt overlays all tunes similar enough to the original. Again, each note is represented as a rectangle and only common notes are drawn. Each rectangle is mostly transparent in this visualization. Because all the tunes are drawn on top of each other, the darker the rectangle, the more often that note appears in that position in this set of tunes. Although I think this results in a rather attractive, barcode-like image, it doesn’t say much about how the tunes are similar.
I represented the tunes as line graphs, plotting the notes in order horizontally and according to pitch vertically. Now you can see where the tunes overlap and how they differ when they don’t overlap. The important feature this rendition highlights is when a phrase is repeated between two tunes but out of phase with the original. Previously, common phrases were not visible unless they occurred in the same place in both tunes.
I created this visualization using a variety of tools. I used Python to scrape and process the data. I used Beautiful Soup to scrape the data from thesession.org and music21 for a lot of the analysis and exploration of the data. I stored the data in a SQLite database.