Home TECHNOLOGY Data Warehouse And Data Lake: Definition And Differences

Data Warehouse And Data Lake: Definition And Differences

But where is the data in the company? The question introduces us to two other terms, Data Warehouse and Data Lake, on which it is necessary to make a minimum of clarity , especially in order not to run into the typical, and superficial, approach that declares certain technologies dead because they do not fully meet the needs of the “digital “(and I am thinking of mainframes, ERPs, etc.). The ease with which, sometimes also for marketing reasons, certain technologies are defined as “rock” leads us to think that the “others” must be quickly dismissed.

The Data Warehouse, according to some, should follow this fate. Its construction results from heavy investments, but certain types of analysis play well its function and, therefore, is often defining hybrid architectures that see the most traditional data Warehouse work alongside the smartest Data Lakes.

The term Data Lake was minted in 2010 and is attributed to James Dixon, CTO of Pentaho (a BI company later acquired by Hitachi Data Systems), who used the metaphor of water (data) and of the lake or bottle ( basin and container) to exemplify the concept:

Data Warehouse

The water is bottled in separate containers according to the type of source that generated them. The destination and use for which they are intended; the data is then saved in homogenous formats, refined, and ready to be distributed operators and systems with specific functions.

The ETL process is typical of Data Warehouses. At the base of the so-called ” schema on write ” where the structure of the database is defined a priori, the data is written to this defined structure and then analyzed.

The process is complex and expensive considering future integrations and implementations with new sources, which is essential if you do not want to build a rapidly obsolete container.

Its features and access modes are perfect for users who need relatively simple analysis and, above all, who know what to look for, what type of relationship to analyze. They are business users who find in these “containers” and in the business intelligence systems that query them excellent answers in terms of reporting and understanding the trend of known phenomena.

In the first decade of the 2000s, the evolution of these solutions was concentrated on the user interface and on the possibility for users to build their dashboards for analysis of specific interest to the user.

Data Lake

Data, like water, flows freely into the “basins” from the sources that generate them, and in these basins, they are examined and sampled in their original formats:

only when applications and operators interrogate them are they converted into readable formats business systems and can be compared with other information.

A Data Lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, log, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video).

Each element is uniquely identified through a series of tags that correspond to its metadata; this allows, when a Data Lake is queried about a specific problem, to extract the relevant data and then submit them to analysis.

According to a specific model, the so-called “on read scheme” is where the data is read without being “written.” The Data Lake, compared to the Data Warehouse, has the advantage of being very flexible and does not require lengthy implementation;

It has simpler management and is reversible without problems (different types of data can be entered and deleted without applying changes to the structure).

Data Lakes are ideal for carrying out ever deeper analyses, discover unimaginable relationships that can lead to the identification of new businesses, but require specialized figures in the analysis. Data Scientists excel, but of which there is a significant lack.

According to Forrester, the results were below expectations: the data lakes proved too expensive, slow to update and what was done was useless, retracing the failures already seen in the past with data warehouses and BI platforms.

Tech Buzz Reviews
Techbuzzreviews are a team full of web designers, freelancers, marketing experts, bloggers. We are on a mission to provide the best technology-related news with passion and tenacity. We mainly focus on the areas like the latest technology news, upcoming gadgets, business strategies and many more upcoming trends which are trending all over the world.

Most Popular

8 Best [*FREE & PAID] Proxy For WhatsApp

WhatsApp is one of the most popular messaging apps around the world. It plays a vital role in everyone’s day-to-day lives. WhatsApp offers a...

Mystalk | View Instagram Public Profiles, Stories Anonymously

Users find new ways or tools they can use in social media to be finer. A tool like Mystalk allows users to stalk public...

IGPanel Net | Gain Free IG Followers, Story Views, Likes

IGPanel Net: The number of followers, views and likes on Instagram reflects an individual's capacity to influence others to follow their content. The key...

What Is 101desires.com? – Complete Details

The internet has become a tool for connecting people and accessing information and resources. 101desires.com website that stands out for its information on computers...

Techy Hit Tools: Get Free Instagram Likes, Views, Followers In 2023

Techy Hit Tools or Techyhit.com is a free website that provides tools for increasing more followers and likes on Instagram. Increasing Instagram followers, comments,...

Google Home Max White [Complete Review]

Google Home Max White Speaker is an AI Smart Speaker that allows users to have an amazing audio experience. This Speaker is managed by...

DisneyPlus Login Guide – How To Access Your Magical Streaming World

DisneyPlus.com has become a precious streaming platform for millions worldwide, thanks to its vast library of classic and contemporary Disney, Pixar, Marvel, Star Wars,...