SQL Server administration and T-SQL development, Web Programming with ASP.NET, HTML5 and Javascript, Windows Phone 8 app development, SAP Smartforms and ABAP Programming, Windows 7, Visual Studio and MS Office software Data Virtualization Tutorials and Guides
Development resources, articles, tutorials, code samples and tools and downloads for ASP.Net, SQL Server, R Script, Windows, Windows Phone, AWS, SAP HANA and ABAP, like SAP UI5, Screen Personas, etc.




Data Virtuality Tutorial Notes


I have been working with Data Virtuality for a while. I am really impressed from the features of this data virtuality platform. I could easily connect SQL Server databases, Oracle, SAP HANA databases, a number of data warehouses like Exasol, Amazon Redshift, Pivotal Greenplum, Snowflake, etc. It is also possible to connect to Amazon S3 bucket files and query delimeted files like CSV files data using standard SQL. Data architects can design solution where data from web services and web apis are queried using SQL in JOIN with relational database table data.

Of course, although the aim of data virtualization is preventing data to be replicated among platforms and access data on-demand from its source, sometimes for the sake of performance data engineers can chose to replicate the data in an analytical storage. I am using Amazon Redshift at the moment for storing such data by using the concept of materialized views.

The real difference which puts the software ahead among its rivals is its query optimization engine especially for federated queries. If the data queried is being read from different data sources, the SQL query optimization engine should perform a brilliant job to manage data access on real-time.

Data virtualization tools like Data Virtuality and Denodo (we should also mention about Denodo when data virtualization is the topic), they provide additional tools for scheduling jobs, data lineage, maintaining data catalog, user and role management, etc.

All their data management and querying features including data modeling, building business views on top of data sources’ tables and views, enable us call Data Virtualization software as a Logical Data Warehouse. This concept, “Logical Data Warehouse”, differs data virtualization tools from Data Warehouse platforms, because they don’t store data, they are just an abstract layer between the data consumers and the data source.

Although I have experience on the tool for a while which started with a workshop and a PoC (Proof of Concept) and continuing with management and maintenance of a productive Data Virtualization platform, I started to read the Data Virtualization tutorial provided by Data Virtuality at their official web site

I must confess that this tutorial is very simple and can be considered as an introduction to data virtualization only. When compared with Denodo’s tutorials, learning and certification resources, Data Virtuality should increase their efforts on this part.

Anyhow, I just wanted to put my notes here; so I can come back later and revise my notes when I need them and I can update with fresh information in future.

Let’s start...

Data Virtualization -> Logical Data Warehouse -> Data Virtuality

Data Virtualization enables “Single source of truth” and “high data quality”

Data Virtuality is a data integration platform for instant data access

DV enables easy data centralization and data governance

DV enpowers rapid BI prototyping, reporting solutions for data visualization front-end tools

Data virtualization platforms provide metadata repositories for master data management

Data lineage is possible with data virtualization software

Data Virtuality has more than 200 connectors which provides easy connection to numerous and various data sources and system in real time.

Data virtualization platform provides a SQL data modeling layer for data architects and data engineers. It is possible to build your own data model on top of underlying data sources using virtual schemas and virtual views.

It is possible to replicate data and materialize data source table and views as well as accessing data realtime

Logical Data Warehouse platforms retrieves structure and metadata of the data sources’ views and tables and serves these in a unified data model to its consumers. Data consumers can easily access data of wide variety of data sources using standard SQL regardless of the data source language

DV tools can materialize tables and vies from data sources to analytical storage for increasing query performance. It is possible to define a schedule to run jobs to materialize data.

Cross database joins are possible and data architects can build data models on top of these federated queries using virtual schemas and virtual views.

Data Virtuality server exposes data to many industry standard reporting tools and applications using JDBC / ODBC or REST protocols.

For data consumers, all complexity and variety of the data sources is invisible. For applications connected to data virtualization platforms, the data will appear just as a regular relational database, RDBMS platform.

For managing federated queries, data virtualization tool DV has Federated Relational Query Engine tool beside classical code push-down strategy.

Data Virtuality query optimizer is mostly rule-based optimizer with some cost based features.

View Builder graphical user interface GUI tool especially for business users to create their own virtual views instead of creating views using SQL

For materializing data incremental load and other options exist for Data Virtuality users

Below picture summarizes the concept of data virtualization and includes the terms which belong to Data Virtuality especially.

On the left, you see data sources, all in different nature.

On the right, the data consumers are displayed. They access data by querying on only a single source which is the Logical Data Warehouse platform Data Virtuality.

The complexity of the left side is hidden for the data consumers. They only use SQL to access all data sources seen on the left.

If accessing data on demand introduces serious latency problems, using the optional analytical storage for materializing data is one of the solutions for performance enhancements.

Data Virtuality, Logical Data Warehouse or Data Virtualization platform

If you have questions that I could not answer here, maybe you can have a look at Frequently Asked Questions (FAQ) page about Data Virtuality





DV Tutorials

Data Virtualization










Copyright © 2004 - 2020 Eralper YILMAZ. All rights reserved.
Community Server by Telligent Systems