Text analysis in History Harvests

Research Question

  • How can the analysis of diction and syntax help researchers connect seemingly different individuals or groups?

By dissecting the transcribed interviews of our history harvest participants, we have started to build a broader understanding of descriptive speech patterns, specifically regarding their unique objects. Whether it be varying socio-economic, political, geographic, or cultural backgrounds—the individuals consistently attached sentimental meanings to their objects of all shapes and sizes. Understanding the relationship between owner and object, particularly the way the owner speaks about their object, is key to understanding how the pair interact daily. With this information in mind, we aim to pair interactions and descriptive speech patterns together. Essentially, we are slowly creating a model that would predict how unlike individuals could come together over similar feelings, actions, and object use. With the city of Bloomington as our example, we intend to identify and explain any trends that present themselves successfully.

Process of analysis

  • Collection of transcripts

The collection of the transcripts went smoothly. Thanks to the class, and additional help transcribing the interviews fairly early along in the process, it was as simple as having a file full of them added to our Text Analysis Box folder.

  • Separate objects into three different groups: Handcrafted, Manufactured and The Outlier

The creation of object groups and separation of said groups was not demanding. Although, it was time-consuming to carefully think the groups that best represented the body of objects, and in turn, sort them accordingly.

alt text

  • Group and upload transcripts onto Voyant

Uploading the transcripts into Voyant was an area that the group struggled with initially. The reason for this? Unfamiliarity with the tool at hand. We weren’t sure how to approach uploading multiple transcripts into Voyant. Initially, we decided to sort all transcripts according to item type and then copy and paste them into one single document for each group. Then, we had to add transcripts as they were finishing up the transcribing process. Because of this, we needed to add to our already sorted documents, which is where our troubles officially began. It was challenging to look at our sorted word documents and pick out which transcripts were present and which were not. Additionally, how would we load these new files into the database we had already been working with on Voyant? We wouldn’t. Once a file or multiple files have been uploaded into Voyant, you can not update it later. When we learned this, we switched our approach. We then made a folder for each group and sorted the transcripts accordingly. The benefits were that we were able to see what files were in what group, it was easy to add or take away from any folder, and uploading each whole folder onto Voyant was possible.

  • Make a list of stop words for each of the different groups using Voyant

In computing, stop words are the most commonly used words in a language. Removing said words before processing the data provides a more clean and efficient use of a program like Voyant. While stop word lists are supposed to be helpful, our group did struggle with them initially. Our group expected them to be relatively straight forward and provide definite results. We did not expect the straight apostrophe as opposed to a curvy apostrophe (or vice versa) to cause issues when spelling contractions on our stop word lists. Human readers usually do not even recognize the difference between the pair, but computers will always catch it. We first noticed it when contractions such as “can’t” or “don’t” appeared as some of the largest on the word clouds (meaning they were the most frequently used). Consequently, we learned that we could not afford to make any slip-ups on our stop word lists, or it would cause our results to be incorrect. Because of this, the group created a cheat sheet of sorts. It was a stop word list that contained all the common words and contractions that we had previously spent time filtering out. By doing this, we could copy and paste the cheat sheet list into each new Voyant file and save time in the sorting process.

Object Groups: Handcrafted, Manufactured and The Outlier

Handcrafted

alt text

In this word cloud, words such as family, culture, special, and home stand out from the rest. Indvidually they might not mean much, but when they are grouped together in this context it is clear why they are there. Each word is indicative of the circumstances or intentions that produced these objects. People take pride in items they handcraft just as they take pride in their culture and place of birth. Indvidual worldview greatly shapes how people express themselves.

Manufactured

alt text

Bloomington, birthday, school, friends, and community emerge from the manufactured word cloud as the primary words. Rather than being crafted by their owners, these items can only be bought by them or gifted to them. Therefore, the connection to the owner is completely different. Here the words are indicative of the owners use or attached meaning for the item. Birthday more than likely refers to how indviduals recevied the item. While Bloomington, school, friends, and community show that manufactured items allow their owners to see themselves as a part of the community through their actions with their indvidual object.

The Outlier

alt text

Malachi the dog is much more than a good boy, he’s the most unique item in our collection. What makes Malachi so different from the rest of the items? Malachi was not handcrafted nor was he manufactured, being the only living organism put Malachi into a class of his own.

Other Groups for the Objects

When we started to look at the objects more closely, we started to see that they could be grouped into smaller categories, too. These categories are: Tradition/Ritual/Habit, Accomplishments, International Culture, Family, and Art/Artwork.

These groups were chosen becase of the way their contributors talked about their individual objects, and the main focus words we saw appear when we ran a meta-corpus analysis. You can see above in the Handcrafted category that some of the largest words are family, culture, home, tattoo and in Manufactured, we see some similar words appear. From these, we decided that the categories above could yeild another level of depth.

Note that each of these word clouds contain the 55 most common words for each group of transcripts, post-stop word list. We also decided to keep the word People out of the stop-word list. This was because we felt that seeing that how our contributors viewed themselves in relation to other people and cultures was important to show.

Tradition/Ritual/Habit

Word Cloud for Tradtion/Ritual/Habit Items

When we look at these items, we can see words like wearing, started, hunting, Christmas, and birthday all start to appear. Alone, these words may not seem like much, other than maybe holidays and activities. However, when we see words like phone, lipstick, coffee, friends, and family appear just as large, it tells us that all of these objects have to do with some sort of ritual, tradition, or habit that people have. Traditions, rituals, and habits play a daily role in our lives. If we take coffee for instance, it is no secret that may people drink coffee on a daily level, be it before work, at work, school, in a coffee shop, etc. This we can classify as both a habit and a ritual: habit for it being “everyday” and ritual for it being part of their morning routine. If we take birthday or Christmas, this is where we see tradition come into the fold. Many people have their own traditions for special occasions; going to church and hanging lights for Christmas, and lighting and blowing out candles for a birthday celebration are just two examples, and both of these also cross into ritual also, by singing songs (Happy Birthday), or a candle lighting/tree lighting during a Christmas celebration at home or at church.

Accomplishments

Word Cloud for Accomplishments Items

When we look at this word cloud, we can see words like undergraduate, school, special, degree, graduation, and celebrate appear. When we see these, even though the sample size for this category is small, is that all these objects have to pertain to accomplishments in some matter. We have a Ph.D. pin, sashes from graduation, a candle from a graduation ceremony, and a book that was published, all showing a major moment in their contributor’s lives.

International Culture

Word Cloud for International Culture Items

When we look at this word cloud, the first words that stand out are brazil, samba, culture, dance, and different. Seeing this, we can notice that international culture played a big role in the ways that people talked about their objects. Brazil was not the only location where these objects had cultural significance, Japan, Mexico, Africa and Puerto Rico also had items with significance shown. This allowed us to see a multitude of things, the most profond being that people, even though they reside and feel connected to the Bloomington/Indiana Univiersity communities, also felt connected to the cultures where their items either orginated from or where they are tradionally involved in the culture’s lives, festivities, etc.

Family

Word Cloud for Family Items

As anticipated, when we see this word cloud, family is the largest word. Other words that appear largely are mom, cousins, kids, auntie, and dog. What this cloud tells us is that people view their objects have a connection to their family, be it the classic definition of family (mom, dad, siblings, etc.), or to what they consider their family. This also gives us knowledge of how something like a painting, dog, or blanket can remind someone that they are part of a larger, more personal community.

Artwork

Word Cloud for Artwork Items

Lastly, in this word cloud, we can see a varity of words appear: tattoo, plate, cleats, alebrija and some colors and ways art is expressed: colors, black, show, blue. What this category represents is the different ways that someone expresses their identity. We had people bring us objects that represented culture, conections, and community in the forms of tattoos, a plate that doubles as a musical instrument, a beach wrap that represents Oshun, a god in Brazilian culture, and a picture of the Gorn from Star Trek that expresses a continuing love for inside jokes between a man and his mother.

Relating our work back to the research question

Processing our transcribed interviews through Voyant gave us the ability to illustrate a visual connection between the categories of items and the spoken words of their owners or creators. The word clouds acted to show the hidden connection between items that individually appear to be quite different. This was demonstrated by the largest words that appeared on the word cloud. The larger the word, the more times it was used in the data set. Once the stop word lists were in effect, all common text such as like, the, can’t, etc. were removed leaving the most important words across all files in that category to present themselves. From there, the group of most meaningful words can collectively be used to draw a conclusion. However, each word cannot do it alone.

Featured artifacts in this exhibit: Text analysis in History Harvests

Guacardo

Vanessa L.

Green avacado toy with a face, arms and legs. A touch of brown on the torso to represent the seed.

Pano

Tattu

Embroidered purple, red, pink, blue, orange, and green motifs on a black fabric. The black fabric is surrounded by a gray border.

Van's Era

J’essence Reynolds

Colorful and vibrant low top shoes with a white sole and white laces. They have black and white checkers with a yellow stripe right through the shoe. It also has...