top of page

Data Analysis for Cyclistic 

 

A Google Coursera Data Analytics Capstone Project

06

Lessons Learned

This was the first project I completed for public presentation, soup to nuts, from downloading, loading, and cleaning the data, to aggregating and analyzing it, to preparing key findings in an intelligible and attractive package. I challenged myself to create a capstone project that went beyond Google's minimum requirements for completing their data analytics course. Indeed I was challenged, and in the course of completing the project I learned skills not included in the course, discovered possibilities for future growth, and identified gaps in my knowledge.

Cleaning datasets

There are good practices for cleaning data that are either easy for the human eye to catch or easy to automate with Python or Excel functions. Tasks like standardizing column names, converting strings into integers, consolidating multiple documents, looking for inconsistencies between aggregations, and creating custom columns can all be accomplished with simple Python code.

But, when your dataset includes millions of rows, there are a lot of ways that bad data can hide. For example, after completing my project, I discovered that the list of stations included the entries "Special Events," "DIVVY CASSETTE REPAIR MOBILE STATION," and "DIVVY Map Frame B/C Station." These look like special cases for use by the company and not stations used by customers, and therefore they should have been filtered out of the analysis. It's not clear to me how I could have discovered these outliers with a Python or Excel function.

This example illustrates the importance in the real world of communicating with the team responsible for building the dataset so they can alert you to special cases like these.

Using AI

AI is a time saver, a teacher, and possibly a crutch. There were several times when I couldn't remember the exact syntax for the code I wanted, or I couldn't intuit how to order subqueries, and Claude AI stepped in to help. Sometimes the AI's solutions used methods I haven't mastered yet, and I took a mental note of them and tried to use these solutions sparingly. I did this project for no reason other than to learn the fundamentals of data analytics, and solving the problems using AI would have defeated that purpose. It is easy to see, however, how a student could have effectively used AI to complete the Coursera course and "earn" the badge.

Getting better with SQL

While I had previous experience with the MySQL DBMS, for this project I decided to create a database with Postgres using pgAdmin4. I didn't find any major differences between the two that mattered for my purposes (except for learning the hard way that Postgres treats single and double quote marks differently).

There were some SQL expressions that were essential to several of my queries that I do not feel I fully understand. In particular, I need to learn more about casting, common table expressions, and OVER PARTITION BY expressions. While I did not use it in any of this project's queries, I should also learn how to use PIVOT and UNPIVOT.

Building a project website with Wix

Learning how to use Wix to build this website was the most unexpected skill I learned during this project. I did not enroll in the Google Coursera course planning on building a website, nor did the course walk me through the process, but the last step of the capstone project is to share your work, and this seemed like a viable option. 

The Wix editor was not immediately intuitive, but now that I've spent time with it I appreciate how organized it keeps your content, and how (relatively) easy it is to arrange things on the page. I'm glad that I won't need to learn JavaScript, at least for my personal projects. I also learned how to use the content management system to link dynamic page elements to CSV documents.

bottom of page