Data Exploration Journal

Johanna Faith Malicdem
Nov 30, 2021
3 min read

Today, on Nov. 19, I am approaching the data set, "Language and Communications Access," which I found using the Boston.gov website after typing in the key words "language," "accessibility," "communication," "accommodation," and "education". It was the only recent data set that touched on the topic I suggested to my group project partner: the progression of language accommodations in the greater Boston area. I am writing this after meeting with Professor Lina Giraldo during office hours, which turned out to be very fruitful. Initially I asked her for advice on how to sort through the data, especially considering that the PDF does not link to any direct data sources and instead reads as an overall guide – a product of a raw data set encompassing the Limited English Proficiency rates of residents in various Boston neighborhoods relative to the total amount of Boston residents. The data set start out looking like this.

It is difficult to find any sort of raw data set in CSV or xl form relating to my subject of research. Being that the Language and Communications PDF is one of the most recent released reports about language and communication demographics in the city is very telling about the initiative to begin with. Today it is Nov. 22, and I am doubtful about the Language and Communications Access PDF I initially found because of how the data is laid out, and because of how tedious it seems to be to manually transcribe and transfer all of the data to an Excel sheet. However, Prof. Lina reassures me and states that the PDF actually provides an adequate amount of contextual and foundational data, and miraculously had already figured out a way to sort through the data. The following are the steps I took to arrive at my current state with my data.

First, I downloaded Adobe Acrobat, which is an application that allows the manipulation of Portable Document Formats, or PDFs. Using this app, I was able to delete the extra text in the Language and Communications PDF that is not needed to more seamlessly process and export the numbers to Microsoft Excel.

Next, and Prof. Lina can take credit for this, I could have copied and pasted all of the fields and numbers into individual cells to mimic a standard table format.

Then, I created a pivot table to start the beginning stages of working with this data set to organize a visualization. First, I created a Pivot Table using all of the Limited English Proficiency, or LEP, demographic in each Boston neighborhood in percentages (which I converted to decimals).

This is the visualization I was approaching this section of the Language and Communications data set, as I plan to create a density map that is interactive and clearly represents the various languages spoken by residents who are in need of English language access such as Allston's top three languages: Spanish, Chinese, and Russian.

Then, I uploaded this data in conjunction with the Boston zipcode xl file in the Data Visualization course resources via the Files tab on Canva. When both of them were uploaded to Tableau, I was able to complete a map representing all Boston neighborhoods, but I still have yet to delete all of the zipcodes that are not represented in the Language and Communications data set. I also have to represent the density of each LEP percentage, as well as label each neighborhood with clear lists of which languages are the top ones spoken in that given area.

With this data, I have been able to conclude how to best represent LEP rates in 2016, which is by using a density map, and clearly noting which languages are most spoken in that given area.

Data Exploration Journal

Recent Posts

Comments