GSPANN built a roadmap to implement data quality framework, breaking it down into quarterly milestones:
- Defined an audit framework and data project roadmap capabilities, participated in technology selection, and set up the environment.
- Connected the audit framework with Griffin.
- Defined KPI thresholds to build alert and trigger mechanism for common desktop environment and SLA-based automation.
- Defined a data quality scoring system for business analysis, system analysis, source systems, and enterprise data warehouse, and re-architected the data quality business rules.
The user defines the measure and flow of filters in the data analytics/data quality (DA/DQ) framework in the new process flow, followed by the job scheduler sending events to execute the job flow. Then, the data analytics framework performs a REST API call to the job scheduler to get execution information for the concluded job run. Finally, the data is stored in the MySQL database.
- The DA/DQ framework finds and triggers a Griffin job linked to the current flow. Griffin sends an HTTP request to Apache Livy that starts the Spark App using measure.jar and measure.json from the request.
- Spark executes measure.jar files, reads data from the source, writes metrics into sinks, and sends a callback to the DA/DQ framework with Yarn's application ID.
- The DA/DQ framework gets a callback from measure.jar, reads metrics from Elasticsearch, and sends notifications if any rules or measures fail.