Feathr – A scalable, unified data and AI engineering platform for enterprise
-
Updated
Apr 4, 2024 - Scala
Feathr – A scalable, unified data and AI engineering platform for enterprise
Automated data quality suggestions and analysis with Deequ on AWS Glue
Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.
The Lightning Catalog is an open-source data catalog designed for preparing data at any scale in ad-hoc analytics, data virtualization, data warehousing, lake houses, and ML projects.
Data quality control tool built on spark and deequ
Example API implementation for Data Caterer
A library for Spark that helps to standardize any input data (DataFrame) to adhere to the provided schema.
Data generation and validation tool for any data source
A Quality Spark DQ and transformation Library
PoC Spark wrapper for validating data
Let's be honest - most data pipeline frameworks treat types as suggestions. Config files are strings. Schemas are "validated" at runtime. Data quality is an afterthought. So, let's do differently
An extensible and configurable ETL tool built on top of Apache Spark
Add a description, image, and links to the data-quality topic page so that developers can more easily learn about it.
To associate your repository with the data-quality topic, visit your repo's landing page and select "manage topics."