Case Study: Synthesizer – powering scalable, privacy-first test data at Aviva
Introducing Synthesizer, the backbone of data-driven development at Aviva
Synthesizer is Aviva’s in-house synthetic data generation tool, purpose-built to meet the growing demand for high-quality, privacy-compliant test data. Designed with flexibility and scale in mind, Synthesizer empowers teams to generate thousands of rows of realistic, production-like data—supporting both functional and non-functional testing across a wide range of environments.
Business challenge
Across Aviva, teams often face delays and constraints when sourcing test data. Real datasets are either too sensitive to use freely or too limited in scope to support robust testing. Manual data creation is time-consuming and prone to errors, while anonymisation techniques can compromise data utility. These challenges hinder delivery velocity and increase the risk of defects slipping into production.
Solution
Synthesizer was developed to address these challenges head-on. It enables users to define custom schemas or replicate existing Hive tables, then generate synthetic data that mirrors the structure and statistical characteristics of real datasets—without exposing any personal or confidential information.
The tool supports:
- High-volume generation: Capable of producing tens of thousands of rows across multiple tables, ideal for simulating full-scale data flows.
- Custom logic and rules: Users can define column-level behaviours, such as conditional logic, blank value ratios, and derived fields.
- Multi-format output: Data can be exported in CSV, JSON, Parquet, or TXT formats, and saved to EFS or AWS S3 for seamless integration.
- Hive integration: Synthesizer can auto-generate Hive DDLs and create Hive tables directly from the synthetic data, streamlining ingestion and validation workflows.
All technical capabilities are grounded in Aviva’s internal architecture and have been validated through real-world use cases across the business.
Implementation at Aviva Synthesizer has been successfully adopted across multiple programmes, including policy administration system (PAS) migrations, data lake onboarding, and dashboard testing.
During a recent PAS implementation, the tool was used to simulate end-to-end data ingestion by generating synthetic source system feeds. This allowed teams to validate ingestion pipelines, transformation logic, and reporting outputs—without relying on production data.
In another instance, Synthesizer enabled exploratory testing for a customer analytics dashboard. By generating targeted datasets with specific patterns and edge cases, the team was able to test system behaviour under a variety of conditions, ensuring robustness and reliability.
Results
- Accelerated delivery: Teams reported significant reductions in test data preparation time, enabling faster development cycles and earlier defect detection.
- Improved compliance: By eliminating the need for real data, Synthesizer ensures alignment with GDPR and internal data governance policies.
- Enhanced scalability: The tool’s ability to generate large volumes of data across complex schemas makes it suitable for both unit testing and full-system validation.
- Cross-functional adoption: Synthesizer is now used by teams across Data Ingestion, Actuarial, Customer Marketing, and more—demonstrating its versatility and value.
Conclusion
Synthesizer is more than just a test data generator—it’s a strategic enabler for secure, scalable, and efficient data-driven development at Aviva. Its ability to produce high-fidelity synthetic data on demand is transforming how teams approach testing, innovation, and collaboration. As adoption grows, so too does the opportunity to unlock new use cases—from model training to performance benchmarking—while maintaining the highest standards of data privacy and integrity.
Aviva is a finalist in Computing's Cloud Excellence Awards