Customer Story

Overcoming E-commerce Data Quality Challenges with Elementary

Kaiyo is a used furniture ecommerce marketplace, with a mission to make the furniture industry more sustainable.

Industry

E-Commerce

Company size

250

Data Stack

Author
Macklin Fluehr
Author
Macklin Fluehr

E-commerce data transformation

Kaiyo is a used furniture ecommerce marketplace, with a mission to make the furniture industry more sustainable. Like any e-commerce company, understanding and improving site functionality is critical to our customer satisfaction and product growth. 

In 2023, the company decided to double down on Product (i.e. our website). Pretty quickly, in speaking with the ambitions of our Executive and Product team, the data team identified a tracking gap in our out-of-the-box web analytics tools. They simply weren’t going to cut it. While great for so many general use cases, the types of changes we wanted to understand better, experiment with, personalize, and build on, just weren't capturable with an off-the-shelf tool. We were going to need to build some detailed tracking infrastructure.

We decided to invest in our own tracking stack, the foundation of which we built on top of Segment. These new Segment events enabled us to dive as deep as we needed to, despite requiring a lot of careful planning, design, and Software Engineering capacity to implement. Armed with Segment and dbt, our data team embarked on a marathon effort over 8-12 months to craft detailed, insightful models and reports for the business.

The Front-End Data Dilemma

Our project was definitely ambitious. I feel like the Greek myth of Icarus might be appropriate here. Quickly enough, we found that tracking basic components on the website, which our existing out-of-the-box tools already did, wasn’t so hard (flying low). Tracking the level of detail we were shooting for and monitoring / maintaining data quality was proving incredibly difficult (flying near the sun, crashing, and burning).

For context, data quality was something the data team at Kaiyo had been incredibly good at. Kaiyo manages its own operations/warehousing. We had built hundreds of dbt models and reports in the past on these operational use cases. These models and reports were easily auditable, thanks to the eyes of both our data team and stakeholders and the structured nature of operational data, which lended well to dbt testing. We considered ourselves pretty avid dbt testers, having built dozens of custom schema tests, and hundreds of cross-tests and attained a relatively high test-to-model ratio.

Front-end data, however, was a different beast. Consisting of JavaScript snippets scattered across web pages, this data lacked a standardized structure for validation, was vulnerable to browser inconsistencies, and was subject to constant fluctuations driven by diverse factors. Segment did not have great tools to manage quality. dbt here didn’t offer great options to enforce quality either. Where we could once enforce a `not_null` or an `accepted_values` test on our operational models, you could only hope on our front-end tables.

Our Approach to Quality Assurance

As our first response, we initiated a stringent QA process for every product release. This process was collaborative, involving both our data team, who developed detailed tracking specifications for each product change, and a dedicated QA engineer from the software development team. While this did dramatically reduce miscommunications and discrepancies with new feature launches, we still had tickets slipping through the cracks. With dozens of tickets getting launched each week from our software team, not all of which laddered nicely into a single feature launch, it was too difficult for the data team to QA all of them.

Our second response was to try to beef up our dbt tests. Taking inspiration from the dbt_utils.not_null_proportion test, we created a new suite of schema proportion tests and custom rolling window tests to try to catch data discrepancies when they occurred. While this psychologically made us feel better and gave us some training wheels, we ultimately found ourselves constantly changing thresholds on these tests as seasonal and intentional product changes “messed up” our data. Many of the tests got to the point where the threshold was so wide, that it wasn’t clear what it was doing anymore or we had so much custom logic, that we questioned whether the maintenance cost was worth it.

The Turning Point - Critical data incident

Things were going pretty okay, despite a lot of manual work from our data team and QA team. We thought we had things relatively under control, despite feeling pretty dissatisfied with the tools at our disposal.

Unfortunately, we had a pretty bad day, 3 months in. A seemingly minor software patch had inadvertently destroyed our cookie tracking mechanism, resulting in a gross inflation of user traffic metrics. This change occurred during a time of year when we’d expect user traffic to increase (though not as dramatically), so metric viewers wrote it off as growth. 

Finding the bug out a month later and needing to inform stakeholders to change their mental maps of our peak season was somewhat of a death blow to our QA system. The amount of manual effort we were spending to yield mounting doubts about our data's reliability (normally not well-founded but in this case definitely founded), made it clear a significant change was necessary.

Seeking a Solution

We quickly put together an RFP for a tool with the following requirements:

  • Anomaly detection of metrics across our stack:
    • Volume
    • Not Null
    • Distribution changes
    • Column Type Changes
  • Preferably compatible / built into dbt
  • Not more expensive than our whole existing data stack

Funny enough, meeting half of these requirements was somewhat difficult. These requirements most closely resemble Data Observability tools, which as of 2023, were all pretty young, often were developed without dbt in mind, or cost tens of thousands of dollars a year (i.e. are for enterprise companies, not startups vying to accomplish what an enterprise company does). We found many tools could do the anomaly detection, but were not version controllable, nor really had any dbt integration.

Our search involved evaluating several data observability tools, each with its unique strengths and limitations–Monte Carlo, Synq, Re:data, Elementary. Their integration complexities, costs, and focus areas often misaligned with our needs. Generally, they were too expensive for what they were worth. If they weren’t expensive, the amount of configuration you needed to get it to work was like boiling the ocean. We didn’t have the money or people on staff to get any of these off the ground, which was frustrating, because our team had directly experienced how much we could build with dbt without needing lots of analysts or data engineers. How was it 2023 and we still couldn’t QA front-end web events?

Eventually, Elementary emerged as the frontrunner, checking off all 3 boxes of our criteria. It was affordable, dbt native, and had anomaly detection tests that you didn’t have to babysit all day.

Implementing Elementary

Integrating Elementary into our ecosystem marked the beginning of a new chapter for our front-end tracking. We set up basic anomaly detection tests across all of our sources and set up an account with Elementary Cloud to centralize our test failure reporting. 

Ultimately, this integration facilitated a more proactive approach to identifying discrepancies, enabling us to detect anomalies within a 24-hour window post-product release—a significant improvement over our previous processes-–1 week - 1 month. We furthermore were able to scrap most of our buggy, custom dbt thresholding tests and give our team back time to work on more impactful projects.

Integrating Elementary into our ecosystem marked the beginning of a new chapter for our front-end tracking.

Macklin Fluehr
Head of Data at Kaiyo

The Outcome

The introduction of Elementary significantly enhanced our front-end data quality. It allowed us to move beyond custom dbt tests and constant threshold adjustments, offering a more intuitive and reliable approach to data QA. While the need for manual QA still exists (you want to get your features right before launch preferably), Elementary’s capabilities ensured that failures could not fall through the cracks. While I don’t know if you ever want to be happy about having problems, the first day I pulled up the dashboard and saw an anomaly, passed it to the software team, and triaged an uncommunicated buggy release, I couldn’t help but feel optimistic that we were going to get it right.

Conclusion

This journey was full of ups and downs and taught us a lot, especially how important it is to be flexible and careful in choosing the right tools for fixing data quality problems. By sharing our story, I hope we can help others facing the same issues, showing that with the right tools and a practical approach, data challenges can be solvable and not a perpetual headache.