Recently, we had the opportunity to speak with Oren and Shenhav from Fiverr, both of whom have been deeply involved in transforming their company’s data platform. Here, we dive into their journey, the challenges they faced, and the strategic decisions they made to future-proof their data platform.
The Challenges of Scaling Data at Fiverr
Shenhav, who leads the Data Development Group at Fiverr, began by explaining why Fiverr decided to refactor its existing data platform. “The key drivers behind refactoring the data platform were scalability and flexibility,” she said. “The existing system was difficult to maintain as the company grew, and the lack of flexibility in the infrastructure made it challenging to accommodate new data sources and support advanced analytics or real-time data processing.”
The goals were clear: improve scalability, enhance data quality, and make the platform more adaptable to future growth. This included optimizing ETL pipelines, implementing robust data governance practices, and increasing transparency and observability to monitor data health in real-time.
Deciding Between Fixing or Rebuilding
For many companies, there comes a critical juncture: should they fix their existing data infrastructure or rebuild it from scratch? Oren, a data developer at Fiverr, highlighted their decision-making process: “We involved all relevant stakeholders – business users, data engineers, and analysts – to ensure everyone’s needs and challenges were understood. This alignment was crucial, whether we were fixing the existing system or rebuilding a new one.”
Their approach began with identifying the requirements and pain points, clarifying gaps, and setting the foundation for either an improved or completely new solution.
A Vision for Transparency and Measurable Data Quality
Shenhav emphasized the importance of data quality from the outset. "I wanted to take a wider view of this project, not just focusing on the modeling side," she explained. "We wanted this new data platform to provide lasting value, and I know that data quality can become a significant pain point down the line. That's why I decided it would be easier to address data quality as part of the modeling process."
The objective was to create a transparent system where data health could be monitored in real-time, with a clear metric that stakeholders could easily understand. This foresight led to an emphasis on defining specific metrics such as completeness, accuracy, and consistency – concrete evidence of data reliability and continuous improvement over time.
Choosing Elementary for Data Observability
To support their data quality goals, Fiverr explored several tools and approaches. They began by using open-source packages with dbt, such as dbt expectations and dbt utils, which provided basic data quality checks. However, they soon discovered Elementary, a tool with robust anomaly detection and observability features.
Oren explained the transition: “We started with Elementary’s open-source platform to run tests on sample data and evaluate its effectiveness in identifying data quality issues. As we progressed, it became clear we needed a more comprehensive way to visualize and manage insights, which led us to explore Elementary’s Cloud Platform.”
From Open-Source to Cloud: A Smooth Transition
Fiverr’s decision to transition from Elementary's open-source tool to the Cloud Platform was driven by the need to manage the growing complexity of their data systems while maintaining a strong focus on data quality. Initially, they started with Elementary's open-source solution, which proved its value by allowing them to run tests and monitor data quality efficiently.
However, as their data infrastructure expanded, they encountered challenges in tracking trends and managing data quality with the existing setup. Oren explained, "Initially, we considered building our own tools or integrating with business intelligence (BI) tools, but we quickly realized that developing new tools in-house would add unnecessary complexity. Instead, we looked for a solution that would provide quick wins for enhancing observability without disrupting our existing workflow."
Shenhav highlighted why the Elementary Cloud Platform was the right choice: "The cloud-based tool we selected was easy to implement and provided immediate impact for observability features. It allowed us to monitor our data quality in a more structured and clear manner. We could see everything organized in one place, which significantly improved our ability to maintain high data quality across our projects."
The transition was designed to be seamless, leveraging the existing open-source framework. Or explained how Elementary ensured a smooth integration: "We invested a lot in making the transition as easy and smooth as possible. Our Cloud Platform complements the open-source package, not replacing it. The open-source dbt package works like an SDK, running in the data pipelines, collecting metadata, logs, and results, and uploading everything into the data warehouse. The Cloud Platform connects to this and syncs all the necessary data, allowing for comprehensive monitoring and alerting without needing to reconfigure new tests or metadata."
This approach meant that Fiverr didn’t have to overhaul their current setup or rework their configurations. Instead, they could build on their existing work, maintaining continuity while gaining new capabilities in data observability.
Additionally, the platform provided more robust visualization and alerting features, enabling Fiverr to monitor data quality more effectively and respond to issues in real-time. "As our system expanded," Shenhav added, "the cloud platform allowed us to track trends and manage data quality in a much more comprehensive way, something we found challenging with the open-source setup alone."
By making this transition, Fiverr gained the ability to handle their growing data needs more effectively, ensuring a strong foundation for future growth and scalability.
Certifying Data Tables for Stakeholder Confidence
Shenhav explained that one of the future goals for Fiverr’s data platform is to create a certification process for data tables, allowing stakeholders to quickly assess whether a dataset is reliable and ready for use. The vision is simple: “Each stakeholder interested in data can open the screen in the morning, get a quick overview of the data, and feel confident in its quality.”
The certification process involves using a data health score to indicate the quality and readiness of datasets. If the data quality score meets or exceeds a predefined threshold, the dataset is considered certified and safe for use. Conversely, if the score falls below this threshold, stakeholders are alerted to potential issues that need to be addressed before relying on the data for decision-making.
“We’re still refining this system,” Shenhav noted, “but the goal is to allow data consumers to view the score and immediately know if the data is of good quality and ready for them to use. The score should provide a quick snapshot of data health, enabling stakeholders to make informed decisions.”
The certification process not only boosts transparency but also establishes a common language between data teams and stakeholders. By creating clear standards and consistently measuring data quality, Fiverr aims to build trust and promote a data-driven culture where stakeholders can confidently rely on the data provided.
Testing the Tests: A Systematic Approach to Reducing Alert Fatigue
A critical component of maintaining data quality is ensuring that the tests themselves are reliable and meaningful. To achieve this, Fiverr adopted a "testing the tests" approach. This strategy helps the team refine and validate data quality checks before they become fully operational, reducing the risk of alert fatigue from false positives or non-critical issues.
Oren explained, “You need to start with a baseline of essential tests that cover the most critical areas. In the beginning, we set thresholds for alerts to ensure that not every minor issue triggers an alert. We focus on significant deviations that require immediate attention.”
Initially, tests are set to a warning mode rather than a critical alert status. This allows the team to track how often these tests trigger warnings and whether they accurately reflect real data quality issues. “We track these warning-mode tests over time,” Oren continued, “and only promote them to operational status when we are confident they are accurate and valuable.”
By using separate channels for warning alerts and critical issues, Fiverr ensures that the team remains focused on resolving the most impactful problems while still refining their overall data quality strategy. This approach minimizes the risk of alert fatigue and keeps the team aligned with the organization’s data quality objectives.
Building a Strong Data Culture
While adopting the right tools is essential, Shenhav and Oren emphasized the importance of building a strong data culture and clear processes. This involved working closely with stakeholders to understand their needs, implementing standards, and integrating data visualization, anomaly detection, and data cleaning into a systematic workflow.
“It’s crucial to have training sessions and educate the team on best practices for data management and quality,” Oren advised. “Data quality is not a one-time setup; it’s an ongoing process of tuning tests, deleting unnecessary ones, and creating new ones as needed.”
A Proactive Approach to Data Quality
Fiverr’s strategy for managing data quality is built on a proactive approach. Instead of waiting for data issues to impact business operations, they leverage real-time insights to address potential problems before they escalate. The goal is to shift from a reactive to a proactive approach, ensuring continuous improvement in data quality.
“The vision is to allow data consumers to view the score and immediately know if the data is ready and of good quality,” said Shenhav. “We’re still refining this system, but the end goal is to have a self-service approach where stakeholders can check the data health at any time.”
Learn more about the Elementary Cloud Platform.
Contributors
Recently, we had the opportunity to speak with Oren and Shenhav from Fiverr, both of whom have been deeply involved in transforming their company’s data platform. Here, we dive into their journey, the challenges they faced, and the strategic decisions they made to future-proof their data platform.
The Challenges of Scaling Data at Fiverr
Shenhav, who leads the Data Development Group at Fiverr, began by explaining why Fiverr decided to refactor its existing data platform. “The key drivers behind refactoring the data platform were scalability and flexibility,” she said. “The existing system was difficult to maintain as the company grew, and the lack of flexibility in the infrastructure made it challenging to accommodate new data sources and support advanced analytics or real-time data processing.”
The goals were clear: improve scalability, enhance data quality, and make the platform more adaptable to future growth. This included optimizing ETL pipelines, implementing robust data governance practices, and increasing transparency and observability to monitor data health in real-time.
Deciding Between Fixing or Rebuilding
For many companies, there comes a critical juncture: should they fix their existing data infrastructure or rebuild it from scratch? Oren, a data developer at Fiverr, highlighted their decision-making process: “We involved all relevant stakeholders – business users, data engineers, and analysts – to ensure everyone’s needs and challenges were understood. This alignment was crucial, whether we were fixing the existing system or rebuilding a new one.”
Their approach began with identifying the requirements and pain points, clarifying gaps, and setting the foundation for either an improved or completely new solution.
A Vision for Transparency and Measurable Data Quality
Shenhav emphasized the importance of data quality from the outset. "I wanted to take a wider view of this project, not just focusing on the modeling side," she explained. "We wanted this new data platform to provide lasting value, and I know that data quality can become a significant pain point down the line. That's why I decided it would be easier to address data quality as part of the modeling process."
The objective was to create a transparent system where data health could be monitored in real-time, with a clear metric that stakeholders could easily understand. This foresight led to an emphasis on defining specific metrics such as completeness, accuracy, and consistency – concrete evidence of data reliability and continuous improvement over time.
Choosing Elementary for Data Observability
To support their data quality goals, Fiverr explored several tools and approaches. They began by using open-source packages with dbt, such as dbt expectations and dbt utils, which provided basic data quality checks. However, they soon discovered Elementary, a tool with robust anomaly detection and observability features.
Oren explained the transition: “We started with Elementary’s open-source platform to run tests on sample data and evaluate its effectiveness in identifying data quality issues. As we progressed, it became clear we needed a more comprehensive way to visualize and manage insights, which led us to explore Elementary’s Cloud Platform.”
From Open-Source to Cloud: A Smooth Transition
Fiverr’s decision to transition from Elementary's open-source tool to the Cloud Platform was driven by the need to manage the growing complexity of their data systems while maintaining a strong focus on data quality. Initially, they started with Elementary's open-source solution, which proved its value by allowing them to run tests and monitor data quality efficiently.
However, as their data infrastructure expanded, they encountered challenges in tracking trends and managing data quality with the existing setup. Oren explained, "Initially, we considered building our own tools or integrating with business intelligence (BI) tools, but we quickly realized that developing new tools in-house would add unnecessary complexity. Instead, we looked for a solution that would provide quick wins for enhancing observability without disrupting our existing workflow."
Shenhav highlighted why the Elementary Cloud Platform was the right choice: "The cloud-based tool we selected was easy to implement and provided immediate impact for observability features. It allowed us to monitor our data quality in a more structured and clear manner. We could see everything organized in one place, which significantly improved our ability to maintain high data quality across our projects."
The transition was designed to be seamless, leveraging the existing open-source framework. Or explained how Elementary ensured a smooth integration: "We invested a lot in making the transition as easy and smooth as possible. Our Cloud Platform complements the open-source package, not replacing it. The open-source dbt package works like an SDK, running in the data pipelines, collecting metadata, logs, and results, and uploading everything into the data warehouse. The Cloud Platform connects to this and syncs all the necessary data, allowing for comprehensive monitoring and alerting without needing to reconfigure new tests or metadata."
This approach meant that Fiverr didn’t have to overhaul their current setup or rework their configurations. Instead, they could build on their existing work, maintaining continuity while gaining new capabilities in data observability.
Additionally, the platform provided more robust visualization and alerting features, enabling Fiverr to monitor data quality more effectively and respond to issues in real-time. "As our system expanded," Shenhav added, "the cloud platform allowed us to track trends and manage data quality in a much more comprehensive way, something we found challenging with the open-source setup alone."
By making this transition, Fiverr gained the ability to handle their growing data needs more effectively, ensuring a strong foundation for future growth and scalability.
Certifying Data Tables for Stakeholder Confidence
Shenhav explained that one of the future goals for Fiverr’s data platform is to create a certification process for data tables, allowing stakeholders to quickly assess whether a dataset is reliable and ready for use. The vision is simple: “Each stakeholder interested in data can open the screen in the morning, get a quick overview of the data, and feel confident in its quality.”
The certification process involves using a data health score to indicate the quality and readiness of datasets. If the data quality score meets or exceeds a predefined threshold, the dataset is considered certified and safe for use. Conversely, if the score falls below this threshold, stakeholders are alerted to potential issues that need to be addressed before relying on the data for decision-making.
“We’re still refining this system,” Shenhav noted, “but the goal is to allow data consumers to view the score and immediately know if the data is of good quality and ready for them to use. The score should provide a quick snapshot of data health, enabling stakeholders to make informed decisions.”
The certification process not only boosts transparency but also establishes a common language between data teams and stakeholders. By creating clear standards and consistently measuring data quality, Fiverr aims to build trust and promote a data-driven culture where stakeholders can confidently rely on the data provided.
Testing the Tests: A Systematic Approach to Reducing Alert Fatigue
A critical component of maintaining data quality is ensuring that the tests themselves are reliable and meaningful. To achieve this, Fiverr adopted a "testing the tests" approach. This strategy helps the team refine and validate data quality checks before they become fully operational, reducing the risk of alert fatigue from false positives or non-critical issues.
Oren explained, “You need to start with a baseline of essential tests that cover the most critical areas. In the beginning, we set thresholds for alerts to ensure that not every minor issue triggers an alert. We focus on significant deviations that require immediate attention.”
Initially, tests are set to a warning mode rather than a critical alert status. This allows the team to track how often these tests trigger warnings and whether they accurately reflect real data quality issues. “We track these warning-mode tests over time,” Oren continued, “and only promote them to operational status when we are confident they are accurate and valuable.”
By using separate channels for warning alerts and critical issues, Fiverr ensures that the team remains focused on resolving the most impactful problems while still refining their overall data quality strategy. This approach minimizes the risk of alert fatigue and keeps the team aligned with the organization’s data quality objectives.
Building a Strong Data Culture
While adopting the right tools is essential, Shenhav and Oren emphasized the importance of building a strong data culture and clear processes. This involved working closely with stakeholders to understand their needs, implementing standards, and integrating data visualization, anomaly detection, and data cleaning into a systematic workflow.
“It’s crucial to have training sessions and educate the team on best practices for data management and quality,” Oren advised. “Data quality is not a one-time setup; it’s an ongoing process of tuning tests, deleting unnecessary ones, and creating new ones as needed.”
A Proactive Approach to Data Quality
Fiverr’s strategy for managing data quality is built on a proactive approach. Instead of waiting for data issues to impact business operations, they leverage real-time insights to address potential problems before they escalate. The goal is to shift from a reactive to a proactive approach, ensuring continuous improvement in data quality.
“The vision is to allow data consumers to view the score and immediately know if the data is ready and of good quality,” said Shenhav. “We’re still refining this system, but the end goal is to have a self-service approach where stakeholders can check the data health at any time.”
Learn more about the Elementary Cloud Platform.