Talend Data Catalog 8.1
- NEW FEATURE OVERVIEW
This update is focused on new business user features, like user collaboration, data documentation, business information diagramming, sharing, ownership, and faceted search. They involve new Articles and Diagrams, as well as improvements on existing (Collections, Worksheets, Dashboards, Presentations) capabilities. These new features are critical for data shopping, data trust, and data health applications.
In addition, new usage analytics capabilities have been added to examine user growth, user search popularity, object inventory growth, glossary growth, documentation coverage, data classification growth, data lineage coverage, user collaboration growth, etc. - NEW FACETED SEARCH AND WORKSHEETS FOR BUSINESS USERS
The search UI has been fully redesigned for a better business user experience, including accelerated faceted search filters fully adjustable for different personas. The filters and faceted search preferences can be saved by the user and exported into Worksheet for advanced analysis.
There are numerous other search improvements with additional adjustable search criteria all aiming at returning the best results first, such as the use of naming standards (can be different per personas) to find the right tables when a business user searches by full business name (it also works the other way around when searching using an acronym). In addition, there is a search details panel on the right. If you show this panel then you may click on an object and view its properties in the panel. It is presented much like the Overview tab in the object page or details page and may be customized when the Overview tab is customized.Worksheets provides metadata reporting capabilities where both search and browse (as well as a direct link in the OBJECTS menu) lead to a common worksheet page:
- One may start from search and then migrating to a worksheet allows for simple text filtering as a basis for a worksheet (e.g. customer)
- One may easily start from a category (e.g. database / tables)
- One may save and share worksheets so that other users may quickly reproduce and build on earlier queries/filtering/column selections
- Direct access to a default worksheet does not predefine anything.
See
- Search in general
- Worksheets in general
- New faceted search worksheet layout
- MQL-based Query Filters
- Faceted search filter editing examples
- Search execution history
- IMPROVED SEARCH PERFORMANCE
Search is now implemented by a dedicated Solr server rather local Lucene index files managed by the MM application server. As a side effect, the overall performance of the MM server and search has been significantly improved. For example, a Worksheet might have a million row that will be indexed and sorted. See Solr Indexing Server. - IMPROVED USER EXPERIENCE
The UI has been significantly improved for better clarity and user experience in many areas, in particular:- Selecting users (or groups) for different use cases (such as filter per user/group, or worksheet sharing to users) has been harmonized and improved for usability (to search) and scalability to a very large number of users.
- Managing Users now offers a paginated UI with filters allowing for much improved scalability.
- User Activity log's UI layout has been redesigned with a new look and feel.
- Object History / change log's UI layout has been redesigned with a new look and feel.
- NEW OBJECT ATTRIBUTE GROUPING STRUCTURE DATA TYPE
This new feature allows for the grouping of object attributes of the same domain/purpose into a dedicated structure data type. That is the case for the existing data profiling attributes (e.g. count, distinct, min value, etc.) or any future operational metadata attributes (e.g. start, end of DI/ETL job executions).
This new grouping structure is not only available for predefined attributes, but also for custom attributes (Manage > Metamodel) where an administrator can create an Address structure with Street, City, ZIP.
This new grouping structure is operational in the MQL language such as "Data Profiling".count and Address.City in Worksheets.
This new feature significantly improves the user experience and readability of a long flat list of object attributes by offering separate widgets for each grouping structure. More importantly, this grouping can be used in the UI customization (dashboard) as an optional widget for just data profiling, operational metadata, etc. - IMPROVED LOCAL DATA DOCUMENTATION
The vocabulary used in the data documentation process had several use cases of "definition", "description", etc., that behaved differently in the system and were difficult to search across or understand the results in worksheets with different types of objects.- This vocabulary has been simplified and harmonized with the following attributes:
- "Name" is unchanged as the physical name (e.g. CUST) of an imported object (e.g. Table) or the actual name of a custom object (e.g. Term).
- "Business Name" is unchanged as the data documented logical name (e.g. Customer) of an imported object (e.g. Table).
- "Definition" replaces and merges "Description" and "Business Description" as the short un-formatted text that defines any object (i.e. can be used as tooltip).
- "Source System Definition" on imported objects replaces any use of "Definition", "Description", or "Comment" in the source system metamodel (profile) imported from data modeling, data integration, and business intelligence tools.
- "Description" replaces "Long Description" as HTML formatted text of unlimited length that can include images, tables, etc.
- Consequently, the default data documentation attributes are defined as follows:
- Any imported object has a Name, a Business Name and a Definition available by default and may also have a Source System Definition, but will not have a Description (Administrators may add it in Manage Metamodel).
- Any custom object only has a Name by default, but does not have a Business Name, Definition or Description (Administrators may add them in Manage Metamodel).
- Note that imported models from logical/physical data modeling tools (e.g. Erwin) have imported objects (e.g. table) that may have both:
- a logical definition of that table called "Definition" or Description" which are now called "Source System Definition",
- a physical definition of that table called "Comment" which comes from the SQL COMMENT concept.
- This vocabulary has been simplified and harmonized with the following attributes:
- IMPROVED MAPPED DATA DOCUMENTATION
Semantic Mapping and Glossary Term Classification have unified into the single concept of a new "Is Defined By" binary relationship type (see further below) with instances stored in the new Semantic Mapping now available as custom models (see further below) - IMPROVED DATA DOCUMENTATION EDITOR
As a consequence of the above local and mapped data documentation improvements, the overall data documentation the documentation editor and semantic flow lineage tab as been grouped as Defined, Mapped or Inferred:- DEFINED Locally: uses editable "Business Name" and "Definition" standard attributes.
- MAPPED Semantically uses other objects (e.g., Term in a Glossary) explicitly directly connected with the "Is Defined By" standard relationship stored in a Semantic Mapping.
- Inferred Documentation uses other means of defining documentation in this priority order:
- RELATED Semantically: Documentation from other objects directly connected with a custom relationship (e.g., "Complies To") set to participate in semantic flow (other than the "Is Defined By" standard relationship stored in a Semantic Mapping).
- INFERRED from Lineage: Documentation from object lineage on by pass-through data flows, and semantic flows.
- CLASSIFIED by Term" Documentation from Terms associated with Data Classes (e.g., "Date Of Birth") set to this column or field object.
- IMPORTED Definition: Documentation from read-only "Source System Definition" standard attribute.
- SEARCHED Term: Documentation from Terms with matching names searched in Glossaries.
- NEW HTML DOCUMENTATION EDITOR
The user experience of editing descriptions, comments, or the new articles and issues (see below) has been dramatically enhanced with a brand-new bundled WYSIWYG (What You See Is What You Get) HTML editor. This editor is available for any custom attribute of HTML data type. This editor brings the equivalent of Google Doc or Microsoft Word within this web application, including all the usual text formatting capabilities, image management, and even copy/paste with formatting from Word or HTML pages. - NEW DOCUMENTATION SUPPORT FOR OBJECT AND USER MENTIONS
In addition, the above newly bundled WYSIWYG HTML editor (of descriptions, comments, and articles) has been enhanced to support mentions to users (e.g. @John) and objects (e.g. @Customer). Users creating new object or user mentions benefit from automatic assistance to auto-complete or more sophisticated search to find the right user or object. Existing mentions are automatically maintained within the documentation upon any renaming of the mentioned object or user. - NEW ARTICLE OBJECT TYPE
Descriptions can be associated with any harvested object (e.g. imported table) or custom object (e.g. a glossary term). They now benefit from the above new bundled WYSIWYG HTML editor with object and user mentions, but are not intended to be full length documents.
Articles are designed for business users to develop and collaborate on any kind of documents such as review reports, change requests, white papers, user guides, overviews, etc. Articles are implemented by a new predefined object "Article" with an predefined attribute "content" of HTML data type.
A new pre-installed "Standard Extension Articles" package allows users to create new models of type "Articles" which contains the Article object type (just like Glossary contains Terms). Manage Metamodel allows one to extend the Article object type with custom attributes or links to other custom objects. Articles benefit from the same capabilities as any other custom objects including search (MQL), security, as well as the ability to have comments, mentions, and even may operate under workflow. - NEW ISSUE OBJECT TYPE
An Issue has an HTML based rich text formatted description that can contain images, tables, and even mentions of users and objects.- An Issue also has the classic attributes (e.g. Status, Priority, Assignee, Reporter) and the relationships (e.g. Blocks, Related To, Duplicates) commonly used by issue tracking systems such as Atlassian JIRA.
- A new pre-installed "Standard Extension Issues" package allows users to create new models of type "Issues" which contains the Issue object type (just like Glossary contains Terms). Manage Metamodel allows one to extend the Article object type with custom attributes or links to other custom objects. Issues benefit from the same capabilities as any other custom objects including search (MQL), security, as well as the ability to have comments, and even operate under workflow.
- NEW (MANAGE METAMODEL) STANDARD OBJECT TYPES
The standard package offers additional predefined object types to model the existing Data Mappings, Semantic Mappings, and the new generation Data Models as object types, including:- New relationships as object types (also known as n-ary relationships in ER modeling or relationship as class in UML) which can have attributes.
These new relationship as objects are directional (roles can be source or target) and can optionally carry semantic flow (on all roles in such case).
Note that these new abstract relationship object types cannot be subtyped in this release.
There are two types of relationship as objects:- The "Binary Relationship" abstract object type connects only two objects at the instance level (with subtypes such as the new "Semantic Link" object type).
- The "N-ary Relationship" abstract object type connects more two objects at the instance level (with subtypes such as the new "Classifier Map" object type).
- New root abstract object types (required as source/target of relationships that can apply to any repository object) as listed below:
Note that these new abstract root object types are virtual (implemented as filters) and cannot be subtyped.- "Any Object" abstract object type represents any standard, custom or imported object type (as used in the Defines/Is Defined relationship on the new Semantic Link object type).
- "Any Imported Object" abstract object type is a (virtual) subtype of "Any Object" representing only Imported Objects created by import bridges.
- New base object types (required for data mappings) as listed below:
Note that these new abstract base object types cannot be subtyped in this release.- "Any Classifier" object type represents any database table, file system file, etc. (as used in the source/target relationships on the new Classifier Map and Feature Map object types).
- "Any Feature" object type represents any table column, file field, etc. (as used in the source/target relationship on the new Feature Map object type).
- New relationships as object types (also known as n-ary relationships in ER modeling or relationship as class in UML) which can have attributes.
- NEW DATA MAPPING CAPABILITIES AND OBJECT TYPES
- Data Mappings are now modeled as objects as instances of the new "Data Mapping" model type (in Manage Metamodel) which includes new object types: Data Mapping Folder, Classifier Map (with subtypes: Replication Mapping and Query Mapping), and Feature Map.
These new data mapping objects benefit from the same capabilities as any other custom objects including search (MQL), security, as well as the ability to have comments, and even operate under workflow. - New Replication Mapping (evolution of Bulk Mapping) allows for replication between tables and files of matching structures supporting automatic update as columns/fields get added/removed. Replication Mapping is supported between tables (of possibly different database servers / technologies such as SQL Server to Snowflake), between files (e.g. CSV), and between databases and files (e.g. load/unload of files as database tables). Replication mapping supports both flat structures (CSV files, RDBMS tables) and hierarchical structures (JSON files and NoSQL structures).
- Data Mappings are now modeled as objects as instances of the new "Data Mapping" model type (in Manage Metamodel) which includes new object types: Data Mapping Folder, Classifier Map (with subtypes: Replication Mapping and Query Mapping), and Feature Map.
- NEW SEMANTIC MAPPING CAPABILITIES AND OBJECT TYPES
- Semantic Mappings are now modeled as objects as instances of the new "Semantic Model" model type (in Manage Metamodel) which includes a new Semantic Link object type.
These semantic link objects benefit from the same capabilities as any other custom objects including search (MQL), security, as well as the ability to have comments, and even operate under workflow. - New search / worksheet driven semantic mapping editing capabilities.
- Semantic Mappings are now modeled as objects as instances of the new "Semantic Model" model type (in Manage Metamodel) which includes a new Semantic Link object type.
- NEW DATA MODELING CAPABILITIES AND OBJECT TYPES
Data modeling can be externally performed with data modeling tools (e.g. Erwin) that can be imported in MM, and then stitched to a matching imported database. Alternatively, relational databases could be imported in Physical Data Model (PDM instead of a regular imported Model) where local documentation and diagrams could be defined. This PDM capability has been deprecated as it has been replaced (a few years ago) by the introduction of the Relationship and Diagram tabs to any imported database enabling users to automatically detect, define and document relationships, and design ER diagrams. These data modeling capabilities were still limited to relational databases, this new release fully redesigned the data modeling capabilities with many new features:- GENERALIZED DATA MODEDLING
- Data modeling is no longer limited to relational (RDBMS) databases, but now also supports hierarchical (NoSQL) databases, and object stores (e.g. JSON in Amazon S3).
- Data modeling is no longer limited to a given RDBMS schema (as with data modeling tools like Erwin for PK/FK relationships), but now also support relationships and diagrams between Classifier (tables or file) located anywhere:
- in any catalog or schema of a given database server (multi-model of an imported models).
- in any database models (Customer id of a table in the DW database in Snowflake and the Sales database SQL Server).
- in any technologies (PO number of a table in the DW database in Snowflake and the field of a JSON file in Amazon S3).
- Data Modeling is no longer limited to entity relationships of (any) data stores, but now also supports any standard or custom relationships (defined in Manage Metamodel) which now even includes Classifier Map, Feature Map, Semantic Link and way more. Therefore opening the door to multi-purpose business diagrams (as explained below) involving different types of relationships to illustrate a use case.
- DATA MODELS AS OBJECTS
As with Data Mappings, and Semantic Mappings, Data Models are now models as objects as instances of the new "Data Model" model type (in Manage Metamodel) which includes new object types: Data Model Folder, Entity Relationship containing Column Mapping(s), and ER Diagram containing ER Diagram Object(s).
These new data model objects benefit from the same capabilities as any other custom objects including search (MQL), security, as well as the ability to have comments, and even operate under workflow. - NEW ER DIAGRAMS
including new graphical layout and rendering properties on objects and relationships (colors, icons, fonts , etc.), and multi purpose as:- as Technical Data Model Diagrams:
represents the primary use case of ER Diagrams fully replacing the use of any external data modeling tool for data documentation, and way more powerful as multi data store and technologies (RDBMS, NoSQL, object stores). - as Business Use Case Diagrams:
These new diagrams can be more business oriented than a pure technical ER Diagram by allowing graphical decorations and any additional object and relationship types (besides joins or PK/FK), such as a Classifier Map, Feature Map, Semantic Link or any custom relationships to illustrate a use case. - as Object Navigator/Explorer Diagrams:
Starting from a given object, users can now graphically expand/navigate any relationships with various automatic layouts (e.g. flow). - Not a substitute for Data Flow and Semantic Flow Diagrams:
Although the new ER Diagrams are multi-purpose for any relationships between entity/object of any model (as explained above), they are not a substitute / replacement of the existing critical interactive analysis diagrams which are:
- Data flow Diagrams for data lineage and impact analysis,
- Semantic Flow Diagrams for semantic definition analysis.
- as Technical Data Model Diagrams:
- NEW ENTITY RELATIONSHIPS
- supporting any relationship types (besides joins or PK/FK),
- enabling worksheet / bulk editing of relationships, as well as CSV import/export.
- NEW ENTITIES
(This feature may be released post GA as cumulative patch)- Allowing the creation of new entities for conceptual / logical data modeling for Enterprise Data Models or new data store requirements.
- GENERALIZED DATA MODEDLING
- NEW DATA FLOW LINEAGE ANALYSIS DIAGRAMS
using fewer objects to render much bigger data flow lineage traces, and allowing:- to decorate objects with tags (such as sensitivity label or PII), and
- to compare the lineage with previous version of that data flow.
- NEW BUSINESS PROCESS MODELS
compliant to the Object Management Group (OMG) Business Process Model and Notation (BPMN) standard (see https://www.bpmn.org) with:- support for importing BPMN XML diagrams from third-party process modeling tools such as https://www.lucidchart.com/),
- support for linking BPMN diagram's data store objects to actual repository objects such as database model, schema, table, etc.
- NEW REFERENCE DATA MODELS
(This feature may be released post GA as cumulative patch)- Code set mappings, and more.
- IMPROVED DATA SAMPLING AND DATA PROFILING
- New data request methods: fast "Top" (now the default) vs. "Random" (reservoir sampling when available on the database) vs. "Custom Query" (on selected tables)
- New data request scope: subset of tables defined by a provided MQL (e.g. tables from a set of schemas, or table with/without a user defined data sampling flag)
- New data overwrite protection (on selected tables) to prevent an automatic data import (e.g. when a previous long random sampling had been performed)
- New data import operation independently of the metadata import operation (the option to automatically perform data import post metadata import remains enabled by default) but explicit data import can now be requested by API or scheduled (Manage Schedules).
- IMPROVED HIGH LEVEL SHAREABLE USER OBJECTS (Collections, Worksheets, Dashboards)
High level user defined objects (e.g. Collections, Worksheets, Dashboards, or Presentations) now have more powerful sharing capabilities with the notions of Owners, Viewers, and Editors available through a user-friendly UI similar to popular cloud object stores like Google Drive.
Collections, Worksheets, Dashboards, Users and Groups are now available in the global search and MQL. - NEW DATA QUALITY
with new Data Quality tab allowing:- New import from Data Quality tools:
- From a commercial tool such as the import bridge from Informatica Cloud Data Quality
- From any other unsupported tools (or in house DQ) using the import bridge from the Meta Integration Data Quality (DQ) CSV Format.
- New ability to stitch (connection resolution) Data Quality models with their associated data store models.
- New Data Quality tab in the UI with data quality score widgets and histograms
- New Search / Worksheet reporting on Data Quality info
- New pre-defined conditional labels on data quality in data flow diagrams
- New import from Data Quality tools:
- NEW DATA SOURCE ACCESS HISTORY
New data source access history attributes (including Popularity Count, Last Access Users, Last Data Access Date, and Last Updated Date) are now available on selected supported objects (including database tables and views or BI reports) of imported models from selected supported tools (such as Snowflake, Google Big Query, or Tableau). - NEW TOOL INTEGRATION
with new Manage > Tool Integration menu allowing:- Browser Extension (chrome) tool integration allowing to automatically display MM data catalog information of the objects displayed by the web page of the supported web application tool including:
- Business Intelligence web apps like Qlik Sense, Tableau, Azure Power BI
- Data stores web apps like Snowflake’
- Data Quality web apps like Informatica Cloud Data Quality
- Issue Management tool integration like Jira (Summer 2024)
- Communication tool integration like Teams (Summer 2024)
- Browser Extension (chrome) tool integration allowing to automatically display MM data catalog information of the objects displayed by the web page of the supported web application tool including:
- NEW USAGE ANALYTICS
A new repository operation "Export analytics" (that can be scheduled on daily basis) allows to generate usage analytics from the repository database, API, audit log and search index into CSV files (by default in $MM_HOME//data/files/mm/analytics). Such files can be analyzed by the customer BI tool of choice (such as Microsoft PowerBI or Tableau), an example is provided in $MM_HOME/conf/Template/analytics/demo/demo.pbix). Possible usage analytics currently include:- Control over the usage analytics scope (selected configuration, or entire repository) and the interval (Days, Months, Years).
- User growth and login per day
- User search (count, popularity)
- Object Inventory (model count, model types, object count, object types, object growth)
- Glossary (term count and growth)
- Documentation (object with documentation count and growth, top documented models))
- Data Classification (object with data classes count and growth, top data classes, data classes count and growth)
- Data Lineage (object with lineage count and growth, model connection count and growth)
- User Collaboration (count and growth of endorsements, certifications, warnings, comments, and attachments)
- NEW SUPPORT FOR MULTI CATALOG DATABASE IMPORT
A critical aspect of importing metadata from large servers is the support for multi-model incremental harvesting where the import bridge can detect changes and efficiently harvest the subset that has been updated. In the case of a large BI server, only the models of the changed reports are imported. In case of a large DB server, only the models of the changed schemas are imported. Not only this multi-model incremental harvesting is much faster, but it also minimizes the space needed in the repository (with version and configuration management license) by reusing the models which did not change.
Currently, most database import bridges require the selection of a single database catalog, apart from SQL Server that allowed the import of multiple catalogs at once (in such case all schemas of a given catalog were stored as a single model).
With this improvement, the database import bridges from popular large cloud servers like Snowflake, Google Big Query, SAP HANA, Presto, and Microsoft SQL Server (including on Azure) now provide native multi catalog support with multi schemas represented as muli-models. This improvement reduces the number of individual Models Import to configure, reduces the amount of repository storage needed (with version and configuration management license), and accelerates the incremental harvesting. In addition, this improvement also significantly facilitates the automatic metadata stitching (connection resolutions) at the entire database server level, automatically resolving changes on the underlying catalogs, and their respective underlying schemas. Finally, this improvement improves data governance by allowing adding responsibilities (Add Roles), at any level from the entire server model, down to any catalog or schema. - THIRD-PARTY SOFTWARE UPDATES
All third-party & open source software has been upgraded to their latest versions for bug fixes, improvements, and better security vulnerability protection. For more details, see Bundled Third-Party Software. - SECURITY VULENRABILITY UPDATES
Numerous major improvements to resolve any new security vulnerabilities, including from third party software upgrades like the use of Java 17 (instead of Java 11 which is no longer supported for security vulnerability fixes). - PRE UPGRADE REQUIREMENTS
- Same steps as any previous releases.
- Physical Data Model (PDM) have been deprecated (and replaced by the local data modeling) in 10.0 (2018) but remained available in 11.0 (2022). PDM is now officially EOL and no longer available in 11.1, therefore make sure you that any legacy PDM models was migrated as regular (imported) Models prior to this upgrade.
- POST UPGRADE ACTIONS
- Same steps as any previous releases.
- SECURITY VULNERABILITY IMPROVEMENT IMPACT
- REST API Help (Swagger based MMdoc web app) is no longer enabled/deployed on default installation.
You must first use $MM_HOME/Setup.sh -we MMDoc -wa MMDoc on the main MM server, as explained in the REST API Documentation Setup. - MANAGE > Servers: The default installation of a remote harvesting agent server only allows for a Local Network connection.
You must first use $MM_HOME/Setup.sh -wa MIMBWebServices on that remote harvesting server to be reachable by the main MM server.
- REST API Help (Swagger based MMdoc web app) is no longer enabled/deployed on default installation.
- IMPROVED DATA DOCUMENTATION
Update external REST API based application using MQL involving Description or Long Description. - NEW SUPPORT FOR MULTI CATALOG DATABASE IMPORT
Full re-import of the multi-catalog database (e.g. SQL Server or Snowflake),
surrounding ETL/DI tools (e.g. Informatica PowerCenter or Talend),
and BI Tools (e.g. Microsoft PowerBI, Tableau),
before taking advantage of the new multi-catalog connection resolutions (i.e. stitching and configuration build) - IMPROVED SEARCH
See Legacy Local Lucene Index File Known Limitation, and therefore consider at least Migrating from Local Lucene Files to Bundled Solr Indexing Server, or better Migrating from Local Lucene Files to External Solr Indexing Server. - IMPROVED HIGH LEVEL SHAREABLE USER OBJECTS (Collections, Worksheets, Dashboards)
As a side effect of such big improvement, the URL of Collections, Worksheets, Dashboards have changed. Any hard coding of such URL when manually editing HTML must be updated.