Cloning a Catalog
As of now, Cloning a Catalog is under review and may be available for a beta release in late 2023. We’ll keep you posted on the same.
As of now, Cloning a Catalog is under review and may be available for a beta release in late 2023. We’ll keep you posted on the same.
A "Resolve Project" can be cloned to allow a user to tweak or change the Project's inputs and re-run it while keeping the original Project intact. This is an A/B experiment option provided to users for their Projects.
A Classification Project can be Cloned to provide the user a means to tweak or change the inputs of the project and re-run it keeping the original project intact. This can be thought of as an A-B experiment feature provided to users, to experiment with their project features.
Catalog Training Tasks can be seen either in ‘My Open Tasks’ or ‘All Tasks’ or ‘All My Tasks’ area.
A user can view the current projects in the Tenant by going to the Data Classification Projects listing screen from the ‘Project’ option in the left navigation panel of the Classify module.
The prerequisites for creating a Concept Parser are the following:
Synonyms can be created by three ways:
In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.
In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.
In the Dataset Attributes tab, which opens as the default tab for a Data Set, the user can perform 2 main actions:
In the Dataset Attributes tab, which opens as the default tab for a Data Set, the user can perform 2 main actions:
Whenever a dedicated environment called Tenant is created, the system creates a Default Catalog for the Resolve module. Remember, the module should be licensed for the workflow to continue optimally. A Catalog can be compared to a business dictionary, a glossary, or a key-value store. It contains the business entities and their attributes. The system further maps the business entities and their attributes to datasets and dataset columns, respectively.
You can Delete a Catalog that you are the Catalog Admin of through the ellipsis menu in the Catalog List appearing for each Catalog. Clicking on Delete Catalog will immediately soft-delete the Catalog with a green top-hat notification. Any Concepts and Semantic Objects inside it will also get soft-deleted.
A user may wish to Delete a Resolve Project as part of a normal Cleanup. This is a soft delete, but currently, there is no way to retrieve the Project from the UI. Deletion of the Project removes it from Display in the project list.
A Classification Project can be edited by any user who has Project Admin rights for that Project. To edit a Classification Project please follow the steps below. Remember that you do NOT need to make changes in all the steps but a specific workflow typically gets saved on pressing the ‘Next’ button unless it has an ‘Apply Changes’ or similar kind of button available in it.
Fixing Tasks, as the name suggests, are the Tasks to "fix" any final or remaining "Data Issues," where the Machine Learning model can't be of much help. This usually happens when a machine learning model has reached or passed a threshold limit of confidence, after which tuning or training would lead to diminishing returns.
Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.
Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.
Searching for a ‘Search’ Term
Import Semantic format (.owl, .ttl, .rdf) files as Catalog
In the earlier sections, we saw how a user can provide feedback and mappings through various means, including the most recent case where the user can provide training through a workflow.
Apart from creating Synonyms through mapping from the user interface, Synonyms can also be imported into the System in bulk. **Synonyms** can be imported from three different screens:
A Catalog is the business data glossary, data dictionary or target data model that will be used as a reference for classifying data. A Catalog is composed of Semantic Objects (or data entities) and their underlying business concepts (or entity attributes). An organization may have multiple Catalogs. The Classify system can be trained to independently classify the same object to different data dictionaries, so a Catalog will ultimately be the linkage between the logical business glossary, and the physical meta-data in the Data Lake.
In earlier sections for Data Set and Catalog, we saw a few ways of Classification. These are listed as follows:
There are two types of Classification Projects:
A Synonym is defined in English as “a word or phrase that means exactly or nearly the same as another word or phrase.” In the context of the Fluree Sense software as well, a Synonym works the same way.
Both Classify and Resolve provide for Viewing of Jobs. A Job very simply is a process triggered in non-blocking or asynchronous fashion where the user can go on working and moving from one screen to another while the job completes its work in the background. In this way, a job may take from a couple of minutes to even hours at times. The performance of a Job depends on the complexity, availability of memory and computing power (essentially the cloud specs) and amount of data.
Both Classify and Resolve provide for Viewing of Jobs. A Job very simply is a process triggered in non-blocking or asynchronous fashion where the user can go on working and moving from one screen to another while the job completes its work in the background. In this way, a job may take from a couple of minutes to even hours at times. The performance of a Job depends on the complexity, availability of memory and computing power (essentially the cloud specs) and amount of data.
- Tenant:
- Tenant:
Once a Catalog is created, it can be edited as required by any user with a Catalog Admin role. Catalog Management provides for the following functionality:
We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.
We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.
In the earlier sections, we've seen how a Project Review, Approver, and Project Admin can provide feedback for Tasks in the Project's "Train Model" screens. Resolve Projects also have a dedicated Manage Project Tasks screen only accessible by the Project Admin.
Once the user has run through Catalog Classification, they can Publish the ‘Semantic Data Set’ to get the benefit of their exercise. Let us understand this concept through an example.
Imagine a situation where a Task is assigned to a specific user, but that user is on leave or unable to work on those tasks. You’d probably re-assign it to a team member if a co-worker from the same department was there, right?
Let us check out the Data Quality Rule Views at the Catalog Level. These include:
Let us check out the Data Quality Rule Views at the Catalog Level. These include:
Supported Platforms
There are two types of Tagging we need to know about as a user:
The Classify System provides users with the flexibility to examine their Business Objects in a Technical View as well. As the name suggests, this view focuses more on Data to Column relationships.
Once the Project has completed its first ‘Run’, the initial results will be available for viewing. Details of these are available in the section on the Project Home Screen and Project Result. The important thing to note is that most Projects won’t achieve a sufficient level of confidence in just the first run.
Catalog Task Training is somewhat like Project Task Training. However, there are some key differences and intricacies. So, let’s look at them.
As we have seen in earlier sections, for bulk updates, importing tasks or feedback is the best method. In the case of Catalog Tasks, as well, we are providing the ‘Bulk Import ‘ feature. To use this feature:
As we have seen in earlier sections, for bulk updates, importing tasks or feedback is the best method. In the case of Catalog Tasks, as well, we are providing the ‘Bulk Import ‘ feature. To use this feature:
In the earlier section, we saw the actual process of Training Tasks from the Task grid, one by one. But there is a quicker and better way - if you want to provide feedback to the Tasks in bulk.
Types of Users and Roles available
The whole purpose of creating a Catalog is the intelligent discovery of data and underlying relationships to match our business dictionary. So, let’s look at that at the simplest level. As a Classify user, you can look at existing ad-hoc mappings to your Data at:
To view “Entities Mastered”, click on “View Results” icon (marked 1) in the lower right panel:
Let's circle back to the Project Creation Flow. After the user has mapped the details and "Run" the Project, it is sent as a job to the Cluster. It may take up to a minute for the Job to move to the processing queue and the progress display to appear on the screen. Once the Job starts, the user can see the progress through various stages of the Resolve project through progress bars with text information across the result areas.