Jo:
On Nov. 19, I am approaching the data set, "Language and Communications Access," which I found using the Boston.gov website after typing in the key words "language," "accessibility," "communication," "accommodation," and "education". It was the only recent data set that touched on the topic I suggested to my group project partner: the progression of language accommodations in the greater Boston area. I am writing this after meeting with Professor Lina Giraldo during office hours, which turned out to be very fruitful. Initially I asked her for advice on how to sort through the data, especially considering that the PDF does not link to any direct data sources and instead reads as an overall guide – a product of a raw data set encompassing the Limited English Proficiency rates of residents in various Boston neighborhoods relative to the total amount of Boston residents. The data set starts out looking like this.
It is difficult to find any sort of raw data set in CSV or excel form relating to my subject of research. Being that the Language and Communications PDF is one of the most recently released reports about language and communication demographics in the city is very telling about the initiative to begin with. Today it is Nov. 22, and I am doubtful about the Language and Communications Access PDF I initially found because of how the data is laid out, and because of how tedious it seems to be to manually transcribe and transfer all of the data to an Excel sheet. However, Prof. Lina reassures me and states that the PDF actually provides an adequate amount of contextual and foundational data, and miraculously had already figured out a way to sort through the data. The following are the steps I took to arrive at my current state with my data.
First, I downloaded Adobe Acrobat, which is an application that allows the manipulation of Portable Document Formats, or PDFs. Using this app, I was able to delete the extra text in the Language and Communications PDF that is not needed to more seamlessly process and export the numbers to Microsoft Excel.
Next, and Prof. Lina can take credit for this, I could have copied and pasted all of the fields and numbers into individual cells to mimic a standard table format. The screenshot below is a cleaned version of the data.
Then, I created a pivot table to start the beginning stages of working with this data set to organize a visualization. First, I created a Pivot Table using all of the Limited English Proficiency, or LEP, demographic in each Boston neighborhood in percentages (which I converted to decimals).
This is the visualization I was using in this section of the Language and Communications data set, as I plan to create a density map that is interactive and clearly represents the various languages spoken by residents who are in need of English language access such as Allston's top three languages: Spanish, Chinese, and Russian.
Then, I uploaded this data in conjunction with the Boston zip code xl file in the Data Visualization course resources via the Files tab on Canva. When both of them were uploaded to Tableau, I was able to complete a map representing all Boston neighborhoods, but I still have yet to delete all of the zipcodes that are not represented in the Language and Communications data set. I also have to represent the density of each LEP percentage, as well as label each neighborhood with clear lists of which languages are the top ones spoken in that given area.
In the screenshot to the right are the neighborhoods that are in the zip code file that I will need to connect to Tableau properly.
With the LEP Rates, zip codes, and neighborhoods accurately in excel and Tableau, I figured it would also be helpful to clearly state which languages are the top languages in any given neighborhood. This way, when a viewer hovers their cursor over any given neighborhood, they can learn the city’s LEP rates as well as the languages that dominate that neighborhood.
With that being said, we decided that the color blue would be the most effective in communicating the concept of language, as we came to agreement that the color we think of when thinking of language is the color blue.
Next, I have titled, captioned, and annotated the visualization, highlighting the languages spoken in each neighborhood, and spotlighting the two neighborhoods with the highest Limited English Proficiency rates. I have also cited the source of the data pulled from Boston’s Language and Communication report.
To round out the information provided in said report, I created two more visualizations. One of them helps to visualize the amount of people with Limited English Proficiency,
as well as the number of residents who speak another language at home. To do this, I created my own Excel sheet, with a column dedicated to the residents field, and another column dedicated to the population count. I then inserted this sheet into Tableau, where I was able to make a tree map. The bottom right annotation assists viewers in understanding the definition of Limited English proficiency.
For my final visualization, I realized it would also be beneficial for readers to see what the LEP rates are of Boston neighborhoods in relation to the total city of Boston population.
For this, I manually input the Language and Communications data in an excel spreadsheet, then put it into tableau to make a bar graph. Then, I made the numerical data descend in growing numbers to help make the visualization more clear and digestible.
In terms of limitations, it was difficult to access any recent data regarding language accessibility and accommodation in the city of Boston. Despite calling at least three times every morning for the last few weeks, I was still unable to meet with an expert from a community organization about the accommodations they were making to aid the accommodation from the city. However, with that said, because we were not able to get in touch with community organizations, we have a very limited scope in our story and in our data in terms of what we can take away from the numbers and visualizations. After calling and emailing both the Language and Communications Accessibility Department at the city of Boston and main street and community organizations countless times, I was unable to get a response or a "yes" to an interview about the project. It was disheartening to know that covering a story like this was so challenging because of the lack of responses we got, especially considering how important of a story language accommodation is in a diverse metropolitan city.
Luz:
The next data set I am proceeding towards is the “Massachusetts Non-English Speakers” which one of my group members has found and suggested using as a foundation guide for having an idea on the estimated amount of how diverse the state of Massachusetts is and narrowing it down to focus on specifying on the Boston area. The data set gave me and my group partners an idea on the progression for diversity in the state of Massachusetts ranging from 2016 all the way to 2019.
This graph was easy to work with because it had already included the option to view and download the dataset used to create the visualization. Once I downloaded the dataset as a CSV, I opened it with Excel to get an idea of the graph and information to understand what I am working with. I was able to read the data without Problem because it was properly cleaned and formatted. I created a table in order to filter and read the graph in various ways. After cleaning the missing parts of the data, I opened tableau and created a visualization graph based on the non English speakers in Massachusetts. I had to play around with the tables because each dataset is different and it took me a while. I am not going to lie even though it looks easy, I did struggle a little and had to restart it multiple times. It took me approximately three sheets in tableau to recreate the foreign speakers population. I first dragged language spoken at home to columns and then ID language spoken at home to rows. I then placed ID language spoken at home to color to differentiate the difference in amount by color from lightest to darkest. Language spoken at home was then placed under Label. I then labeled the story’s title according to its dataset. This was still incorrect and it was not until professor Lina helped me out with the dataset by helping me first understand what I an working with
I also compared the ID language spoken by year running from 2016 through 2019. I used the same dataset to create the new data visualization after the professor helped me understand how to organize the data being presented.
The next dataset I worked with was the “New Bostonians Demographic Report” by The Mayor’s Office of New Bostonians. This dataset was very descriptive and in terms of total sums but I had difficulty narrowing it down to details of each sub group. I tried to create an excel spreadsheet manually, incorporating pieces of information to create a visualization of the New Bostonians by Country of Origin by top percent for the census of 2020.
Emma:
After Jo found the Language and Communication Access data set, we decided that we would use that as our focus for the project. We then set out on finding other data sources to support our story. Jo was able to find the “Breaking the Language Barrier” report and I decided to take that on in researching. This included reading through, cleaning, organizing, and creating visualizations with the data presented. Two sections stood out to me most that I thought would be great additions to our story. The first was “Number of Students Served by Boston Neighborhood”. I thought it would be interesting to add an educational angle to the narrative.
It really helped that they already had a clear chart made because it made converting the PDF into an Excel spreadsheet using Adobe Acrobat pretty seamless.
Once I cleaned and organized it in Excel, I was able to upload it in Tableau and create a chart showing how many students in each neighborhood of Boston are receiving accommodations. I thought a chart was best because we already have a map for a visualization and you can more easily see the disproportionate numbers in each neighborhood.
The second section of the data source that I found useful for our narrative was one that focused on gaps in the current system. This one was titled “Suggested Program Improvements Cited by Surveyed Greater Boston ESOL Providers”.
Again, converting this into an Excel spreadsheet was pretty seamless. I decided to create an interactive pie chart with this dataset because you can really see how badly people need flexibility with accommodations while still showcasing the other issues that need to be addressed.
Comments