Assigning Data Source Admins
When creating a Data Source, or anytime later (as a Data Source Admin), a user can assign additional Admins to the Data Source through the screen below. The logged-in user will, for obvious reasons, be automatically moved into the right panel and considered to be the Data Source Admin.
Assigning Data Source Admins
When creating a Data Source, or anytime later (as a Data Source Admin), a user can assign additional Admins to the Data Source through the screen below. The logged-in user will, for obvious reasons, be automatically moved into the right panel and considered to be the Data Source Admin.
Classifying a Data Set
A logged-in user with Read/Write or Admin Entitlement can Classify the Data Set through:
Classifying a Data Set
A logged-in user with Read/Write or Admin Entitlement can Classify the Data Set through:
Cloning a Resolve Project
A "Resolve Project" can be cloned to allow a user to tweak or change the Project's inputs and re-run it while keeping the original Project intact. This is an A/B experiment option provided to users for their Projects.
Creating an Entity
To create a new Entity, please follow the steps listed below:
Creating New Data Set
New Data Set(s) can be created by selecting the required Data Set content files OR by a Create All Job from the Data Source screen. Let us look at the first way below:
Creating New Data Set
New Data Set(s) can be created by selecting the required Data Set content files OR by a Create All Job from the Data Source screen. Let us look at the first way below:
Creating New Data Source Connection
Creating a new Data Source connection begins with choosing the Data Source Type and some other details as shown in the screen below. You can have multiple Data Sources feeding into the Classify product. This also requires entering the:
Creating New Data Source Connection
Creating a new Data Source connection begins with choosing the Data Source Type and some other details as shown in the screen below. You can have multiple Data Sources feeding into the Classify product. This also requires entering the:
Data Quality Exceptions in Golden Records
We’ve talked about the Data Quality of Golden Records in the earlier section. Apart from metrics and summary information on quality, the system also provides details of exceptions and rules that caused conflict pertaining to EACH Golden Record.
Data Quality of Golden Records
Now let’s talk about the Data Quality measure. For Data Management teams, It is important to gauge and improve the quality of data, especially for Golden Records which will be considered as a refined source of truth for the teams. To enable this, we’ve provided users with a holistic look at the Data Quality of the generated Golden Record Dataset.
Data Set Object Roles & Entitlements
When creating a Data Set, the logged-in user needs to provide Entitlements to that Data Set to themself and other users associated with the Tenant. These Entitlements are:
Data Set Object Roles & Entitlements
When creating a Data Set, the logged-in user needs to provide Entitlements to that Data Set to themself and other users associated with the Tenant. These Entitlements are:
Data Set Relationships
Data Set Relationships can be accessed through the namesake tab (i.e., ‘Data Set Relationship’) after opening a Data Set. Access to this tab requires a minimum of Data Read Entitlement for the Data Set.
Data Set Relationships
Data Set Relationships can be accessed through the namesake tab (i.e., ‘Data Set Relationship’) after opening a Data Set. Access to this tab requires a minimum of Data Read Entitlement for the Data Set.
Data Set Sample
By clicking on the Data Set Sample tab, the user is taken to a screen where a sample of all the columns of the Data Set is shown. Note that Data Set Columns tagged as PII, (Personally Identifiable Information) will be masked. Columns are tagged PII not directly but by their concepts in Catalog which we’ll talk about in another section.
Data Set Sample
By clicking on the Data Set Sample tab, the user is taken to a screen where a sample of all the columns of the Data Set is shown. Note that Data Set Columns tagged as PII, (Personally Identifiable Information) will be masked. Columns are tagged PII not directly but by their concepts in Catalog which we’ll talk about in another section.
Data Source Content Entitlements
In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.
Data Source Content Entitlements
In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.
Dataset Attributes
After the user creates and registers a Data set, they can click on a dataset to be redirected to the main Data set page. This page will give key information about the Data Set.
Dataset Attributes
After the user creates and registers a Data set, they can click on a dataset to be redirected to the main Data set page. This page will give key information about the Data Set.
Deleting a Data Set
You can Delete a Data Set that you have no use for if you're having Dataset Admin rights to that Data Set. This is a Soft-Delete and the file is not physically deleted because Fluree Sense simply captures the meta-data from the physical data. The physical data will continue to reside in the appropriate Data Source.
Deleting a Data Set
You can Delete a Data Set that you have no use for if you're having Dataset Admin rights to that Data Set. This is a Soft-Delete and the file is not physically deleted because Fluree Sense simply captures the meta-data from the physical data. The physical data will continue to reside in the appropriate Data Source.
Deleting a Data Source
This is a future feature under consideration- currently not available. In case you are not using a Data Source actively, simply switch it off at the source for now.
Deleting a Data Source
This is a future feature under consideration- currently not available. In case you are not using a Data Source actively, simply switch it off at the source for now.
Deleting a Project
A user may wish to Delete a Resolve Project as part of a normal Cleanup. This is a soft delete, but currently, there is no way to retrieve the Project from the UI. Deletion of the Project removes it from Display in the project list.
Editing a Data Set
Once a Data Set is added, it appears in the Data Set list screen. Depending on the processes that have run on it, you can view the Data Set columns, Sample, etc. If the Data Set registration job is complete, you will also be able to see the latest Concepts to which that Data Set’s columns are mapped.
Editing a Data Set
Once a Data Set is added, it appears in the Data Set list screen. Depending on the processes that have run on it, you can view the Data Set columns, Sample, etc. If the Data Set registration job is complete, you will also be able to see the latest Concepts to which that Data Set’s columns are mapped.
Editing a Data Source
You can edit a Data Source that you have created if you have a Data Source Admin role for that Data Source. Please follow the steps below to edit a Data Set. These are essentially the same steps as in the Create Data Source workflow. You may either just move to the Next step without making any edits in a specific screen, or make edits wherever you feel it is necessary.
Editing a Data Source
You can edit a Data Source that you have created if you have a Data Source Admin role for that Data Source. Please follow the steps below to edit a Data Set. These are essentially the same steps as in the Create Data Source workflow. You may either just move to the Next step without making any edits in a specific screen, or make edits wherever you feel it is necessary.
Editing Data Set Entitlements
In an earlier section, we looked at how Data Set Entitlements are set when creating a Data Set. However, it is quite possible that you may wish to edit those existing rights. This can be done from the ‘Data Entitlements’ tab in the Data Set detail view.
Editing Data Set Entitlements
In an earlier section, we looked at how Data Set Entitlements are set when creating a Data Set. However, it is quite possible that you may wish to edit those existing rights. This can be done from the ‘Data Entitlements’ tab in the Data Set detail view.
Editing Golden Records Manually
Golden Records get edited in two ways.
Entity Attributes and Reference Data
In the sections that talked about creating and editing and an Entity, we saw how the Entity Attributes could be created/edited. When the Entity is in edit mode, the user can also link Reference Data to an Entity’s attribute by linking a Dataset Column to the Entity Attribute. This is shown in the image below. Attributes with pre-existing Reference Data show the copy icon in the column, and on opening up, they also show the linked Reference Data.
Fixing Tasks
Fixing Tasks, as the name suggests, are the Tasks to "fix" any final or remaining "Data Issues," where the Machine Learning model can't be of much help. This usually happens when a machine learning model has reached or passed a threshold limit of confidence, after which tuning or training would lead to diminishing returns.
Getting Started
Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.
Getting Started
Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.
Giving Feedback to Ad-hoc Mappings
Now that we have seen what ad-hoc mappings look like in the earlier section, let's check out how we can give feedback to these mappings. The process of feedback is almost the same at both the Semantic Object and Concept level. The only difference is that the feedback at Semantic Object is being given to Data Set mappings whereas at Concept Level is being given to Data Set column mappings.
Importing Concept Mappings
In the earlier sections, we saw how a user can provide feedback and mappings through various means, including the most recent case where the user can provide training through a workflow.
Introduction to Catalogs
A Catalog is the business data glossary, data dictionary or target data model that will be used as a reference for classifying data. A Catalog is composed of Semantic Objects (or data entities) and their underlying business concepts (or entity attributes). An organization may have multiple Catalogs. The Classify system can be trained to independently classify the same object to different data dictionaries, so a Catalog will ultimately be the linkage between the logical business glossary, and the physical meta-data in the Data Lake.
Introduction to Entities
An Entity in the Resolve module is the same as what we refer to as Semantic Objects in Classify. An Entity can be a uniquely identifiable person, institution or thing and is the business object which may be referenced by multiple data tables (or Data Sets as we call them). For example, let's say we have ‘Customer’ as an Entity, and we have a Data Set for ‘Customer Profile’ and another one for ‘Customer Address Information’. In this case, we may arrive at the conclusion that both data sets refer to the same Entity.
Key Terms and Concepts
- Tenant:
Key Terms and Concepts
- Tenant:
Managing Data Source Entitlements
We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.
Managing Data Source Entitlements
We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.
Managing Project Tasks by Admin
In the earlier sections, we've seen how a Project Review, Approver, and Project Admin can provide feedback for Tasks in the Project's "Train Model" screens. Resolve Projects also have a dedicated Manage Project Tasks screen only accessible by the Project Admin.
Overview of Classify
- Fluree Sense is a full end-to-end platform designed to Ingest, Classify, Resolve, and Consume Big Data.
Publishing Golden Records
Once the Golden records are generated where you feel you have the requisite level of confidence and quality, you can go ahead and publish them. Golden Records can be published any time after the first run of the Project. There is no system threshold, confidence level, etc. for publishing and we’ve left it to the users to decide when they want to publish their Golden Records Dataset.
Reassigning Catalog Tasks
Imagine a situation where a Task is assigned to a specific user, but that user is on leave or unable to work on those tasks. You’d probably re-assign it to a team member if a co-worker from the same department was there, right?
Refreshing & Re-profiling Data
A Data Set undergoes Registration and Profiling the first time it is registered. This is explained in detail in the Editing a Data Set section. However, in the practical world, data never stays constant. Often, a Data Source will be a changing one which will get updated periodically. Provided certain conditions are met, Fluree Sense provides the capability of being able to refresh your data and get the delta (changed) records ad-hoc or as per a pre-set schedule.
Refreshing & Re-profiling Data
A Data Set undergoes Registration and Profiling the first time it is registered. This is explained in detail in the Editing a Data Set section. However, in the practical world, data never stays constant. Often, a Data Source will be a changing one which will get updated periodically. Provided certain conditions are met, Fluree Sense provides the capability of being able to refresh your data and get the delta (changed) records ad-hoc or as per a pre-set schedule.
Registering / Profiling a Data Set
As discussed in the section in Creating Data Sets, once a new Data Set is created, the process for profiling and registering is triggered as well. This process happens asynchronously and in steps. In the initial step, the Data Set sample and attributes are loaded and displayed. Then, in the next step, the profiling of the Data Set is undertaken. Next, as the Classification task is run on the Data Set, Data Set Relationships are re-generated and DQ rules are re-run. While this happens, it is indicated through the progress bar/loader in various sections of the Data Set.
Registering / Profiling a Data Set
As discussed in the section in Creating Data Sets, once a new Data Set is created, the process for profiling and registering is triggered as well. This process happens asynchronously and in steps. In the initial step, the Data Set sample and attributes are loaded and displayed. Then, in the next step, the profiling of the Data Set is undertaken. Next, as the Classification task is run on the Data Set, Data Set Relationships are re-generated and DQ rules are re-run. While this happens, it is indicated through the progress bar/loader in various sections of the Data Set.
Rule Applied Columns
Let’s circle back to the point when we were Creating a Business Rule. If you recall, in that flow we were able to review the existing Data Columns mapped to the primary Concept of that rule. We could also add and remove mappings so as to re-adjust the Concept’s model before running or scheduling a Data Quality rule. This ability is provided as a flexible add-on feature for users.
Rule Applied Columns
Let’s circle back to the point when we were Creating a Business Rule. If you recall, in that flow we were able to review the existing Data Columns mapped to the primary Concept of that rule. We could also add and remove mappings so as to re-adjust the Concept’s model before running or scheduling a Data Quality rule. This ability is provided as a flexible add-on feature for users.
Rule Exceptions
Exceptions are essentially the results of the rule. If the rule is broken for any record of the concerned Data Set, an exception record is generated. This is available for view once the Rule Run is complete and the Score appears in the Rules List grid, for that rule.
Rule Exceptions
Exceptions are essentially the results of the rule. If the rule is broken for any record of the concerned Data Set, an exception record is generated. This is available for view once the Rule Run is complete and the Score appears in the Rules List grid, for that rule.
System Configuration
Supported Platforms
Technical View of Semantic Objects
The Classify System provides users with the flexibility to examine their Business Objects in a Technical View as well. As the name suggests, this view focuses more on Data to Column relationships.
Training at Semantic Object Level
Training the Model at Object Level - through workflow:
Training Merging Tasks
The Golden Record creation (i.e., “Merging”) model synthesizes the records within a cluster into a single record containing the best data from all records in the cluster. So, if there are three possible addresses from records from three different sources in a cluster, the “Merging” model will attempt to select the most likely accurate address out of the three.
Types of Data Sources
Fluree Sense allows different types of Data Sources and can take Data in the form of CSV as well as Files and from RDBMS Tables. Currently, Fluree Sense can support the following Data Sources:
Types of Data Sources
Fluree Sense allows different types of Data Sources and can take Data in the form of CSV as well as Files and from RDBMS Tables. Currently, Fluree Sense can support the following Data Sources:
User Management
Types of Users and Roles available
Viewing Ad-hoc Mappings to Catalogs
The whole purpose of creating a Catalog is the intelligent discovery of data and underlying relationships to match our business dictionary. So, let’s look at that at the simplest level. As a Classify user, you can look at existing ad-hoc mappings to your Data at:
Viewing Catalogs
All the active Catalogs appear in the Catalog List screen with their names and some other useful information as shown below. Users can access this screen from the ‘Catalog’ option in the left nav of Classify.
Viewing Data Sets
When the user clicks on the Data Set tab on the left menu, they will be directed to the main Data Set page. This page will include all the datasets that the user has access to, as well as some information about these datasets including:
Viewing Data Sets
When the user clicks on the Data Set tab on the left menu, they will be directed to the main Data Set page. This page will include all the datasets that the user has access to, as well as some information about these datasets including:
Viewing Data Sources
Fluree Sense allows users to access Data from various cloud-based and on-prem environments such as Databricks, Cloud Storage, Hadoop, Snowflake, or traditional RDBMS such as Microsoft SQL etc. In this section, we will explore the screen where you have a holistic view of all the data sources.
Viewing Data Sources
Fluree Sense allows users to access Data from various cloud-based and on-prem environments such as Databricks, Cloud Storage, Hadoop, Snowflake, or traditional RDBMS such as Microsoft SQL etc. In this section, we will explore the screen where you have a holistic view of all the data sources.
Viewing Entities Mastered
To view “Entities Mastered”, click on “View Results” icon (marked 1) in the lower right panel:
Viewing Entities Resolved
Now, let's look at the results of the "Entity Resolution" model. You can access the results by clicking the eyeglass or "View Results" icon in the "Entities Resolved" panel.
Viewing Golden Records History
Each time a Golden Record is changed, including the first time it is assigned, its history gets appended. This is a simple log of what is happening with the Golden Record. A user can view the history of a specific Golden Record by clicking the history tab in the Golden Records detail screen.
Viewing Golden Records Lineage & Relationships
Another important view of Golden Records that users may want to see is the Golden Records Lineage view. As the name suggests, this view shows which specific records across the source systems have been combined to create the Golden Records after the resolving and mastering operations.
Viewing Project Home Screen
Let's circle back to the Project Creation Flow. After the user has mapped the details and "Run" the Project, it is sent as a job to the Cluster. It may take up to a minute for the Job to move to the processing queue and the progress display to appear on the screen. Once the Job starts, the user can see the progress through various stages of the Resolve project through progress bars with text information across the result areas.
Viewing Resolve Project Confidence
There are two important measures related to Golden Records. One is the Model Confidence, and the other is Data Quality. Let us look at Confidence first. The Confidence is shown separately for Entities Resolved and Entities Mastered. The Model Confidence in Resolve, and typically across the product is split into High, Medium and Low confidence records, which together give the combined confidence figure.