Dell has developed a reference architecture type design for a combined data lake/data warehouse using third party partner software and its own server, storage and networking hardware and software.
Like Databricks† dremio† SingleStoreand Snowflake, Dell is considering a single data lakehouse construction. The concept is that you have a single, universal store without having to perform extraction, transformation and loading (ETL) processes to select raw data and get it into shape for use in a data warehouse. It is as if there is a virtual data warehouse in the data lake.
Chhandomay Mandal, Dell’s director of ISG solutions marketing, blogged: “Traditional data management systems, such as data warehouses, have been used for decades to store structured data and make it available for analysis. However, data warehouses are not set up to handle the increasing variety of data. Dell has developed a reference architecture-type design for a combined data lake/data warehouse using third-party partner software and its own server, storage and networking hardware and software – text, graphics, video, Internet of things (IoT) – and they also cannot support artificial intelligence (AI) and machine learning (ML) algorithms that require direct access to data.”
Data lakes can do that, he says. “Today, many organizations use a data lake in conjunction with a data warehouse – they store data in the lake and then copy it to the warehouse to make it more accessible – but this adds to the complexity and cost of the analytics landscape. .”
What you need is one platform to do it all and Dell’s Validated Design for Analytics – Data Lakehouse provides it, supporting business intelligence (BI), analytics, real-time data applications, data science and machine learning. It is based on PowerEdge servers, PowerScale unified block and file arrays, ECS object storage, and PowerSwitch networks. The system can be housed on-premises or in a colocation facility.
The software technologies component includes the Robin Cloud Native Platform, Apache Spark (open source analytics engine), and Kafka (open source distributed event streaming platform) with Delta Lake technologies. Databricks’ open-source Delta Lake software is built on top of Apache Spark, and Dell uses Databricks’ Delta Lake in its own data lakehouse.
Dell is also collaborating Robin.IO . acquired by Rakuten with its open source Kubernetes platform.
Dell recently announced a deal for remote table access with Snowflake and says this data lakehouse validated design concept complements that. Presumably, external Snowflake tables could reference the Dell data lakehouse.
With the Dell image above, things are starting to look complicated. A Dell Solution Overview contains more information, along with this table:
Obviously this is not a turnkey system and a lot of careful research and component selection and sizing is required before making a deal with Dell.
Interestingly, HPE has a somewhat similar product, Ezmeral Unified Analytics† This also uses Databrick’s Delta Lake technology, Apache Spark, and Kubernetes. HPE is hosting a Discover event this week and lots of news is expected. Perhaps the timing of Dell’s announcement is no coincidence.