Blog Post

Behind the Build: Code Review 2023

January 4, 2024 code by akhil

As the year comes to a close, we went through data about learners in our community and pulled together some of the greatest highlights of 2023, called Code Review. This is basically a data-driven deep dive into the topics that learners like you connected with, the projects you all worked on, and the courses you spent the most time with this year.

To pull off Code Review, we aggregated anonymized data for our learners to surface the trending tools, topics, and courses over the past year. You can find the full recap and explore more of our findings on the blog, like the top coding courses from 2023, the most popular projects and Docs, and an interview with a learner who’s maintained an eight-year learning streak.

If you’re interested in data science, you might be curious just how we pulled, analyzed, and found stories within this data. Code Review was a cross-departmental collaboration between our product, data, creative, and engineering teams. Ahead, our Product Marketing Lead Donté Ledbetter and Data Analytics Manager JR Waggoner explain how Code Review went from a hackathon idea to a reality.

Learn something new for free

The project: Highlight the progress our learners made in 2023 through data

We wanted to look back and identify the most interesting and entertaining data about our learners’ habits and how they engaged with our learning platform during the past year. “Spotify set the trend back in 2016, so a lot of companies do this sort of thing,” Donté says. “People like to feel like they’re making progress, and this was a good way to show the progress our learners made throughout the year.”

The main tasks our team had to accomplish to create Code Review were:

Gather anonymized data from our data pipeline
Model the data with SQL and dbt (aka data build tool)
Find a creative way to share the findings with our community

Investigation and roadmapping

Donté: “We started talking to the data science team in May to see what this could be like, and if we even had the data available to aggregate and send to people. [My role] was basically just being the project manager; making sure it gets done and doing the legwork upfront to convince people it was worth doing. Once we decided we were going to do this, I gave guidance on its execution and marketing impact.

The idea had been pitched in hackathons for years, so I basically pushed it forward to get it done this year. It’s cool when you get to show people the progress they’ve made, because in edtech, people want to feel like they’re making progress.”

Implementation

JR: “The data team drove most of the technical side of the project. We started out in more of an advisory capacity as Donté and the product team threw out different ideas. Once we moved out of that initial planning phase, we clicked into going out and gathering all of the different assets and metrics. We were kind of just going on an adventure to find the data, throw some basic statistics at it, and see what we could do.

Some of the things that we were asked for are our bread and butter, things we do every day. But there were some more interesting questions that had never really been asked before. Those questions did require some new data modeling — going out and finding new stuff and figuring out how to work with it. They weren’t super challenging [asks], more ‘how do we answer this question’ than ‘do we have the data for this.’

We have a pretty intricate data pipeline that we use to measure how users like to interact with our site or interact with our content — enrolling in courses, submitting code, working on projects, etc. In some cases, simple SQL queries were all we needed. For the more interesting/complex questions that required more complex data modeling, like those that involved a series of complex interactions over time, we generally turned to dbt. dbt is a great tool for manipulating data and modeling complex data pipelines, and we use it to power most of the analytics at Codecademy. So, dbt is where we did most of the data heavy lifting for this project.”

Troubleshooting

JR: “Data engineering is sometimes very much like traditional software development. Other times, it’s the Wild West. You’re taking this list of metrics or data points that the team has compiled and converting them into what we know from the data.

For the most part, we know where to find the data points we want to find. From there, we got into the more technical part of writing whatever code we needed to write to make sense of the data, making sure we captured all of the metrics, checked all the boxes, and brought it all together into a single, easy to consume model for the project.

We wanted to explore a couple of opportunities in areas we’ve never been before, but the value of doing so would have to far outweigh the lift required. We understand a lot about how learners engage in the learning environment — what they submit, how long they spend on different tasks, etc. —  but even today, there are still some aspects of how users interact with our content that we’ve never explored.

There were some crazier, more aspirational ideas from the outset, but we couldn’t scope what getting there would even look like. Aside from that, the biggest challenge was general noise. There are always these weird edge cases that are really hard to pinpoint on both ends of the spectrum. Is this a power user or a Google spider crawling all of our web pages for SEO?”

Ship

JR: “After we assembled everything and it looked really good, the next step was getting the data in front of the rest of the team, which meant building a bunch of views in Looker. That was a pretty big step because that’s when we really started to iterate, fine-tune and refine our perspective of what was actually possible. In some cases, we just wouldn’t have had the data ready in time to support some of the original asks. In other cases, we wandered into these really interesting pockets of engagement, but they were so nuanced that we wouldn’t have been able to share very much about them. It was a very engaging, iterative, back and forth process to get everything dialed-in until the whole team was happy with the how things looked.”

Donté: “The data science team created a Looker dashboard, so we sent it to the creative team, marketing channel owners, and the CRM [Customer Relationship Management] team. We held a brainstorm with our creative team to come up with different ideas and marketing activation tactics. We used a big Figjam and talked about how Code Review could come alive on social media and in the blog, and how we might create some virality and excitement among our learners.”

Retrospective

JR: “We had people on the marketing side thinking about strategy and use cases, folks on the product side experimenting with different perspectives on engagement and versions of the project, and we worked closely with marketing, engineering and CRM to figure out how to best share.”

Conversation has been edited for clarity and length.

Write a comment