Skip to main content

50 docs tagged with "catalog"

View all tags

Cloning a Catalog

As of now, Cloning a Catalog is under review and may be available for a beta release in late 2023. We’ll keep you posted on the same.

Cloning a Resolve Project

A "Resolve Project" can be cloned to allow a user to tweak or change the Project's inputs and re-run it while keeping the original Project intact. This is an A/B experiment option provided to users for their Projects.

Cloning and Deleting a Classification Project

A Classification Project can be Cloned to provide the user a means to tweak or change the inputs of the project and re-run it keeping the original project intact. This can be thought of as an A-B experiment feature provided to users, to experiment with their project features.

Creating a Classification Project

A user can view the current projects in the Tenant by going to the Data Classification Projects listing screen from the ‘Project’ option in the left navigation panel of the Classify module.

Data Source Content Entitlements

In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.

Data Source Content Entitlements

In the previous section, we assigned Data Source Admin rights but that is not the end of the matter. Each Data Source contains content files or Tables. To use the Data from these files or Tables, it is also required to provide their entitlements to selected users.

Dataset Attributes Feedback

In the Dataset Attributes tab, which opens as the default tab for a Data Set, the user can perform 2 main actions:

Dataset Attributes Feedback

In the Dataset Attributes tab, which opens as the default tab for a Data Set, the user can perform 2 main actions:

Default Catalogs

Whenever a dedicated environment called Tenant is created, the system creates a Default Catalog for the Resolve module. Remember, the module should be licensed for the workflow to continue optimally. A Catalog can be compared to a business dictionary, a glossary, or a key-value store. It contains the business entities and their attributes. The system further maps the business entities and their attributes to datasets and dataset columns, respectively.

Deleting a Catalog

You can Delete a Catalog that you are the Catalog Admin of through the ellipsis menu in the Catalog List appearing for each Catalog. Clicking on Delete Catalog will immediately soft-delete the Catalog with a green top-hat notification. Any Concepts and Semantic Objects inside it will also get soft-deleted.

Deleting a Project

A user may wish to Delete a Resolve Project as part of a normal Cleanup. This is a soft delete, but currently, there is no way to retrieve the Project from the UI. Deletion of the Project removes it from Display in the project list.

Editing a Classification Project

A Classification Project can be edited by any user who has Project Admin rights for that Project. To edit a Classification Project please follow the steps below. Remember that you do NOT need to make changes in all the steps but a specific workflow typically gets saved on pressing the ‘Next’ button unless it has an ‘Apply Changes’ or similar kind of button available in it.

Fixing Tasks

Fixing Tasks, as the name suggests, are the Tasks to "fix" any final or remaining "Data Issues," where the Machine Learning model can't be of much help. This usually happens when a machine learning model has reached or passed a threshold limit of confidence, after which tuning or training would lead to diminishing returns.

Getting Started

Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.

Getting Started

Login to your account by accessing the URL provided to you and enter the provisioned User ID and password as shown below.

Importing Concept Mappings

In the earlier sections, we saw how a user can provide feedback and mappings through various means, including the most recent case where the user can provide training through a workflow.

Importing Synonyms

Apart from creating Synonyms through mapping from the user interface, Synonyms can also be imported into the System in bulk. **Synonyms** can be imported from three different screens:

Introduction to Catalogs

A Catalog is the business data glossary, data dictionary or target data model that will be used as a reference for classifying data. A Catalog is composed of Semantic Objects (or data entities) and their underlying business concepts (or entity attributes). An organization may have multiple Catalogs. The Classify system can be trained to independently classify the same object to different data dictionaries, so a Catalog will ultimately be the linkage between the logical business glossary, and the physical meta-data in the Data Lake.

Introduction to Synonyms

A Synonym is defined in English as “a word or phrase that means exactly or nearly the same as another word or phrase.” In the context of the Fluree Sense software as well, a Synonym works the same way.

Job Types

Both Classify and Resolve provide for Viewing of Jobs. A Job very simply is a process triggered in non-blocking or asynchronous fashion where the user can go on working and moving from one screen to another while the job completes its work in the background. In this way, a job may take from a couple of minutes to even hours at times. The performance of a Job depends on the complexity, availability of memory and computing power (essentially the cloud specs) and amount of data.

Job Types

Both Classify and Resolve provide for Viewing of Jobs. A Job very simply is a process triggered in non-blocking or asynchronous fashion where the user can go on working and moving from one screen to another while the job completes its work in the background. In this way, a job may take from a couple of minutes to even hours at times. The performance of a Job depends on the complexity, availability of memory and computing power (essentially the cloud specs) and amount of data.

Managing Catalogs

Once a Catalog is created, it can be edited as required by any user with a Catalog Admin role. Catalog Management provides for the following functionality:

Managing Data Source Entitlements

We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.

Managing Data Source Entitlements

We’ve seen in earlier sections that a user can only modify or access the details of a Data Source where they are the Data Source Admin.

Managing Project Tasks by Admin

In the earlier sections, we've seen how a Project Review, Approver, and Project Admin can provide feedback for Tasks in the Project's "Train Model" screens. Resolve Projects also have a dedicated Manage Project Tasks screen only accessible by the Project Admin.

Publishing Semantic Data Set

Once the user has run through Catalog Classification, they can Publish the ‘Semantic Data Set’ to get the benefit of their exercise. Let us understand this concept through an example.

Reassigning Catalog Tasks

Imagine a situation where a Task is assigned to a specific user, but that user is on leave or unable to work on those tasks. You’d probably re-assign it to a team member if a co-worker from the same department was there, right?

Tagging of Data

There are two types of Tagging we need to know about as a user:

Technical View of Semantic Objects

The Classify System provides users with the flexibility to examine their Business Objects in a Technical View as well. As the name suggests, this view focuses more on Data to Column relationships.

Training a Classification Project

Once the Project has completed its first ‘Run’, the initial results will be available for viewing. Details of these are available in the section on the Project Home Screen and Project Result. The important thing to note is that most Projects won’t achieve a sufficient level of confidence in just the first run.

Training Catalog Generated Tasks

Catalog Task Training is somewhat like Project Task Training. However, there are some key differences and intricacies. So, let’s look at them.

Training Tasks in Bulk through Import

As we have seen in earlier sections, for bulk updates, importing tasks or feedback is the best method. In the case of Catalog Tasks, as well, we are providing the ‘Bulk Import ‘ feature. To use this feature:

Training Tasks in Bulk through Import

As we have seen in earlier sections, for bulk updates, importing tasks or feedback is the best method. In the case of Catalog Tasks, as well, we are providing the ‘Bulk Import ‘ feature. To use this feature:

Training Tasks in Bulk through Import

In the earlier section, we saw the actual process of Training Tasks from the Task grid, one by one. But there is a quicker and better way - if you want to provide feedback to the Tasks in bulk.

Viewing Ad-hoc Mappings to Catalogs

The whole purpose of creating a Catalog is the intelligent discovery of data and underlying relationships to match our business dictionary. So, let’s look at that at the simplest level. As a Classify user, you can look at existing ad-hoc mappings to your Data at:

Viewing Entities Mastered

To view “Entities Mastered”, click on “View Results” icon (marked 1) in the lower right panel:

Viewing Project Home Screen

Let's circle back to the Project Creation Flow. After the user has mapped the details and "Run" the Project, it is sent as a job to the Cluster. It may take up to a minute for the Job to move to the processing queue and the progress display to appear on the screen. Once the Job starts, the user can see the progress through various stages of the Resolve project through progress bars with text information across the result areas.