Skip to main content

69 docs tagged with "data-quality-rules"

View all tags

Classifying a Data Set

A logged-in user with Read/Write or Admin Entitlement can Classify the Data Set through:

Classifying a Data Set

A logged-in user with Read/Write or Admin Entitlement can Classify the Data Set through:

Cloning a Resolve Project

A "Resolve Project" can be cloned to allow a user to tweak or change the Project's inputs and re-run it while keeping the original Project intact. This is an A/B experiment option provided to users for their Projects.

Cloning a Rule

You can clone a Business or a Technical rule from the main grid by clicking on the ‘Clone’ option in the ellipsis menu next to the Id of the Rule.

Cloning a Rule

You can clone a Business or a Technical rule from the main grid by clicking on the ‘Clone’ option in the ellipsis menu next to the Id of the Rule.

Creating Business Rule

We’ve talked about Technical Rule creation in detail, but imagine a scenario where you’re a Fintech company, which has Data about a customer’s account spread across multiple tables. Suppose you want to run a rule, which says do not provide Credit to the customer if the sum of his liabilities is greater than or equal to twice the assets.

Creating Business Rule

We’ve talked about Technical Rule creation in detail, but imagine a scenario where you’re a Fintech company, which has Data about a customer’s account spread across multiple tables. Suppose you want to run a rule, which says do not provide Credit to the customer if the sum of his liabilities is greater than or equal to twice the assets.

Creating Technical Rule

Let us get started by checking out how a Tenant’s user can create a Technical Rule. Any Tenant user can create a Technical or Business Rule.

Creating Technical Rule

Let us get started by checking out how a Tenant’s user can create a Technical Rule. Any Tenant user can create a Technical or Business Rule.

Data Quality

If your Tenant has the license to use the ‘Data Quality’ add-on product, the Data Quality tab will allow you to view the inherent quality of your data in multiple ways at multiple levels. This is a huge topic and one we’ll cover in the dedicated section on Data Quality.

Data Quality

If your Tenant has the license to use the ‘Data Quality’ add-on product, the Data Quality tab will allow you to view the inherent quality of your data in multiple ways at multiple levels. This is a huge topic and one we’ll cover in the dedicated section on Data Quality.

Data Quality Exceptions in Golden Records

We’ve talked about the Data Quality of Golden Records in the earlier section. Apart from metrics and summary information on quality, the system also provides details of exceptions and rules that caused conflict pertaining to EACH Golden Record.

Data Quality In Resolve

Measuring and improving the quality of data is a very critical feature for any organization. With this in mind, Fluree Sense provides Data Quality as an important add-on for both Classify and Resolve modules. The details of Data Quality Rules creation and Management can be found in detail here. But let's look at how the Data Quality Module integrates with Resolve.

Data Quality Main Views

A Data Quality Rule, in simple terms, is a set of predefined rules that we expect the Data to follow and if it doesn’t, then the same needs to be captured as an ‘exception.’ There are 2 types of Rules in the system:

Data Quality Main Views

A Data Quality Rule, in simple terms, is a set of predefined rules that we expect the Data to follow and if it doesn’t, then the same needs to be captured as an ‘exception.’ There are 2 types of Rules in the system:

Data Quality of Golden Records

Now let’s talk about the Data Quality measure. For Data Management teams, It is important to gauge and improve the quality of data, especially for Golden Records which will be considered as a refined source of truth for the teams. To enable this, we’ve provided users with a holistic look at the Data Quality of the generated Golden Record Dataset.

Data Quality Rules for Entities

As explained in the earlier sections, the Resolve project involves using Business Entities to generate a Golden Record, ideally containing the most complete and up-to-date set of information after complex matching and merging of operations. To get the most out of this process, keeping the quality of data used in those Entities in check is important.

Data Source Content Entitlements

In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.

Data Source Content Entitlements

In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.

Deleting a Project

A user may wish to Delete a Resolve Project as part of a normal Cleanup. This is a soft delete, but currently, there is no way to retrieve the Project from the UI. Deletion of the Project removes it from Display in the project list.

Deleting Rules

Data Quality rules can be deleted one at a time or in bulk. The logged-on user needs to have the Rule(s) Admin entitlements to delete a rule. A rule, when deleted, immediately disappears from all the grids where it used to appear and is not counted or used for Data Quality processing anymore. Internally, this is implemented as a soft delete.

Deleting Rules

Data Quality rules can be deleted one at a time or in bulk. The logged-on user needs to have the Rule(s) Admin entitlements to delete a rule. A rule, when deleted, immediately disappears from all the grids where it used to appear and is not counted or used for Data Quality processing anymore. Internally, this is implemented as a soft delete.

Editing a Rule Definition

To Edit a rule, you are provided with multiple tabs for easy editing of manageable grouped aspects of the Rule. There is a specific tab for editing the Rule Definition, a tab for editing the Rule Entitlements, and another for editing the Rule Schedule . Additionally, there are tabs to view or provide feedback to the Applied Columns on the Rule and View the Exception of the rule.

Editing a Rule Definition

To Edit a rule, you are provided with multiple tabs for easy editing of manageable grouped aspects of the Rule. There is a specific tab for editing the Rule Definition, a tab for editing the Rule Entitlements, and another for editing the Rule Schedule . Additionally, there are tabs to view or provide feedback to the Applied Columns on the Rule and View the Exception of the rule.

Editing Rule Entitlement(s)

You can always change the Rule entitlements for your Rule if you’re the Rule Admin. A Technical or Business rule provides for two roles:

Editing Rule Entitlement(s)

You can always change the Rule entitlements for your Rule if you’re the Rule Admin. A Technical or Business rule provides for two roles:

Editing Rule Schedule(s)

We’ve talked about the various options which come with Rule Scheduling in the earlier section. Now, let’s see how a user can edit existing Rule Schedule(s). This feature is available to the Rule Admin of the rule, otherwise the relevant icon / button remains disabled.

Editing Rule Schedule(s)

We’ve talked about the various options which come with Rule Scheduling in the earlier section. Now, let’s see how a user can edit existing Rule Schedule(s). This feature is available to the Rule Admin of the rule, otherwise the relevant icon / button remains disabled.

Entity Attributes and Reference Data

In the sections that talked about creating and editing and an Entity, we saw how the Entity Attributes could be created/edited. When the Entity is in edit mode, the user can also link Reference Data to an Entity’s attribute by linking a Dataset Column to the Entity Attribute. This is shown in the image below. Attributes with pre-existing Reference Data show the copy icon in the column, and on opening up, they also show the linked Reference Data.

Fixing Tasks

Fixing Tasks, as the name suggests, are the Tasks to "fix" any final or remaining "Data Issues," where the Machine Learning model can't be of much help. This usually happens when a machine learning model has reached or passed a threshold limit of confidence, after which tuning or training would lead to diminishing returns.

Getting Started

Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.

Getting Started

Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.

Giving Feedback to Ad-hoc Mappings

Now that we have seen what ad-hoc mappings look like in the earlier section, let's check out how we can give feedback to these mappings. The process of feedback is almost the same at both the Semantic Object and Concept level. The only difference is that the feedback at Semantic Object is being given to Data Set mappings whereas at Concept Level is being given to Data Set column mappings.

Importing Rules in Bulk

Fluree Sense also provides an interface to create rules quickly and easily in bulk through import. You can import both Technical and Business rules in Bulk.

Importing Rules in Bulk

Fluree Sense also provides an interface to create rules quickly and easily in bulk through import. You can import both Technical and Business rules in Bulk.

Introduction to Data Quality

The Classify Product also ships with a very powerful and comprehensive Add-On called Data Quality. Data Quality in a nutshell is the 360-degree view of the quality of your data across dimensions such as Timeliness, Accuracy, Validity, Completeness, etc. Any Data Discovery, Classification and de-duplication exercise is incomplete without the knowledge of the inherent quality of Data. The Data Quality add-on brings this into the system both at a business-user level and specifically down to tables, joins and filters.

Introduction to Data Quality

The Classify Product also ships with a very powerful and comprehensive Add-On called Data Quality. Data Quality in a nutshell is the 360-degree view of the quality of your data across dimensions such as Timeliness, Accuracy, Validity, Completeness, etc. Any Data Discovery, Classification and de-duplication exercise is incomplete without the knowledge of the inherent quality of Data. The Data Quality add-on brings this into the system both at a business-user level and specifically down to tables, joins and filters.

Managing Data Source Entitlements

We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.

Managing Data Source Entitlements

We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.

Managing Project Tasks by Admin

In the earlier sections, we've seen how a Project Review, Approver, and Project Admin can provide feedback for Tasks in the Project's "Train Model" screens. Resolve Projects also have a dedicated Manage Project Tasks screen only accessible by the Project Admin.

Mappings in Business Rule

Mappings are the lifeblood of our Classify product and it enriches and powers the Machine Learning System. Therefore, while creating a Business Rule, you have the flexible options to Remove Existing Mappings and / or Add New Mappings to your Concept as part of the rule.

Mappings in Business Rule

Mappings are the lifeblood of our Classify product and it enriches and powers the Machine Learning System. Therefore, while creating a Business Rule, you have the flexible options to Remove Existing Mappings and / or Add New Mappings to your Concept as part of the rule.

Other Data Quality Rule Views

The Fluree Sense Data Quality feature provides a 360 degree view of the Data Quality of your data. Not only can you view the Data at a Data Set level but also at the Catalog (Data Dictionary), Semantic Object or Concept level. Some of these views also depend on your licensing – for example the Catalog, Semantic Object and Concept level views will obviously only be visible if you have the Classify Product licensed.

Other Data Quality Rule Views

The Fluree Sense Data Quality feature provides a 360 degree view of the Data Quality of your data. Not only can you view the Data at a Data Set level but also at the Catalog (Data Dictionary), Semantic Object or Concept level. Some of these views also depend on your licensing – for example the Catalog, Semantic Object and Concept level views will obviously only be visible if you have the Classify Product licensed.

Out of the Box Rules

An ‘Out of the Box’ Rule as the name suggests is a pre-packaged or pre-developed rule that just needs some minimal information to be set-up and validate quality on certain aspects.

Out of the Box Rules

An ‘Out of the Box’ Rule as the name suggests is a pre-packaged or pre-developed rule that just needs some minimal information to be set-up and validate quality on certain aspects.

Re-running Rules

If you recall, during rule creation we had the option to just Save the rule or Save & Run. So what happens if we just Save the rule? How do we run it – especially if it is scheduled as ‘Once’ or as a manually triggered rule?

Re-running Rules

If you recall, during rule creation we had the option to just Save the rule or Save & Run. So what happens if we just Save the rule? How do we run it – especially if it is scheduled as ‘Once’ or as a manually triggered rule?

Rule Applied Columns

Let’s circle back to the point when we were Creating a Business Rule. If you recall, in that flow we were able to review the existing Data Columns mapped to the primary Concept of that rule. We could also add and remove mappings so as to re-adjust the Concept’s model before running or scheduling a Data Quality rule. This ability is provided as a flexible add-on feature for users.

Rule Applied Columns

Let’s circle back to the point when we were Creating a Business Rule. If you recall, in that flow we were able to review the existing Data Columns mapped to the primary Concept of that rule. We could also add and remove mappings so as to re-adjust the Concept’s model before running or scheduling a Data Quality rule. This ability is provided as a flexible add-on feature for users.

Rule Exceptions

Exceptions are essentially the results of the rule. If the rule is broken for any record of the concerned Data Set, an exception record is generated. This is available for view once the Rule Run is complete and the Score appears in the Rules List grid, for that rule.

Rule Exceptions

Exceptions are essentially the results of the rule. If the rule is broken for any record of the concerned Data Set, an exception record is generated. This is available for view once the Rule Run is complete and the Score appears in the Rules List grid, for that rule.

Rule Validations & Error Handling

Fluree Sense also provides for rule validation in case a Rule fails to compile at run-time successfully during the Job processing. At the very basic level, a rule converts to a query or function so if for some reason that query fails or some other issue happens, the error information is provided to the user in the rule itself. A rule failing validation will appear with the following visual aids:

Rule Validations & Error Handling

Fluree Sense also provides for rule validation in case a Rule fails to compile at run-time successfully during the Job processing. At the very basic level, a rule converts to a query or function so if for some reason that query fails or some other issue happens, the error information is provided to the user in the rule itself. A rule failing validation will appear with the following visual aids:

Rule Views at Dataset Level

A user can also analyze the Data Quality at the Dataset Level starting from the whole Dataset down to specific columns and then for each rule on that column.

Rule Views at Dataset Level

A user can also analyze the Data Quality at the Dataset Level starting from the whole Dataset down to specific columns and then for each rule on that column.

Running the Model

Another aspect that the user needs to be aware of is that whenever a Run model is activated, whether by Classifying a Dataset or by training a model at the Object or Concept Level, it triggers classification for the whole tenant. This is because the Concept is linked to other Concepts, Data Quality Rules, Data Sets and any change in that Concept cannot be independent. So, the changes occur across the Tenant in a holistic manner as determined by the machine learning model.

Scheduling Rules

A Rule’s schedule is the cadence it runs on. A rule can be triggered either once or according to a schedule.

Scheduling Rules

A Rule’s schedule is the cadence it runs on. A rule can be triggered either once or according to a schedule.

Viewing Catalogs

All the active Catalogs appear in the Catalog List screen with their names and some other useful information as shown below. Users can access this screen from the ‘Catalog’ option in the left nav of Classify.

Viewing Entities Mastered

To view “Entities Mastered”, click on “View Results” icon (marked 1) in the lower right panel:

Viewing Entities Resolved

Now, let's look at the results of the "Entity Resolution" model. You can access the results by clicking the eyeglass or "View Results" icon in the "Entities Resolved" panel.

Viewing Project Home Screen

Let's circle back to the Project Creation Flow. After the user has mapped the details and "Run" the Project, it is sent as a job to the Cluster. It may take up to a minute for the Job to move to the processing queue and the progress display to appear on the screen. Once the Job starts, the user can see the progress through various stages of the Resolve project through progress bars with text information across the result areas.

Viewing Resolve Project Confidence

There are two important measures related to Golden Records. One is the Model Confidence, and the other is Data Quality. Let us look at Confidence first. The Confidence is shown separately for Entities Resolved and Entities Mastered. The Model Confidence in Resolve, and typically across the product is split into High, Medium and Low confidence records, which together give the combined confidence figure.