Monday, March 01, 2021

VCs Dump Dollars Into Data Lakes, Data Warehouses

There are two sides of the modern cloud data platform – data lakes and the data warehouses – and while the line between warehouses and lakes has grown fuzzier, players on both sides of the isle have been raking in the cash. There are two sides of the modern cloud data platform – data lakes and the data warehouses – and while the line between warehouses and lakes has grown fuzzier, players on both sides of the isle have been raking in the cash. Massive late-stage investments reflect the growing importance of big data technology and analytics as enterprises struggle to extract in-depth insights from growing volumes of data across departmental silos, mainframes, and legacy systems. And it also indicates that these types of cloud-delivered storage and analytics services will continue growing in value through 2021. Here’s a roundup of the most prominent data lake and data warehousing investments from the past six months.  Databricks closed a $1 billion Series G funding round that put the company at a $28 billion post-investment valuation. The February funding round attracted an A-list of venture investors and the cloud computing elite including Franklin Templeton, BlackRock, Microsoft, Amazon Web Services (AWS), Salesforce Ventures, and CapitalIG, the growth fund for Google parent company Alphabet. Databricks, which is privately held, recently rounded out its public cloud support with the launch of Databricks on Google Cloud Platform (GCP) with integrations to Google’s BigQuery and AI Platform. This means Databricks can now instantiate a data “lakehouse” capable of data engineering, data science, machine learning, and analytics universally across the big three cloud providers and supply customers with a “single source of truth” to run all of their data workloads, the company claims.  Matt Aslett, research VP of data, AI, and analytics at 451 Research, explained that a data lakehouse is an environment designed to combine the data structure and data management features of a data warehouse with the low-cost storage of a data lake.   Dremio, a data analytics storage startup, earlier this year closed a $135 million Series D round that lifted its valuation to an even $1 billion and into unicorn status. To date, Dremio has raised $247 million in six funding rounds, and the Series D comes nine months after a $70 million round during which time the company doubled its customer count, employee base, and revenue.  Dremio, which was founded by former MapR employees Tomer Shiran and Jacques Nadeau in 2015, offers an analytics service called Data Lake Engine. It provides fast query speeds and a self-service semantic layer that operates directly against data lake storage. The platform is built on open source technologies including Apache Arrow and Apache Arrow Flight, which the company co-created to provide columnar, in-memory data representation and sharing.  Shiran explained that Dremio enables business analysts and data scientists to explore and analyze any data in a self-service fashion at any time, regardless of location, size, or structure using their preferred tools such as Tableau, Python, and R.  Firebolt, a cloud-native data warehouse startup, emerged from stealth late last year with $37 million in financing and an offering to boost speeds hot enough to melt Snowflake’s hype into a springtime puddle.  The Israel-based startup took the best of Snowflake, improving its limitations in storage, indexing, and query optimization, and built a cloud-native data warehouse that, according to Firebolt CEO Eldad Farkash, is more cost effective for ad-hoc interactive, high performance, semi-structured data, and operational or customer-facing analytics that require continuous ingestion. The cloud-native data warehouse is built on a consumption-based model and features what the company calls rapid warehousing, which is really just an SQL query engine for semi-structured data with native array manipulation functions. “If you look at companies who switch from understanding their business to driving their business with data, using the traditional data warehousing or the traditional data lake approach just doesn’t cut it anymore,” Farkash said. To that end, Firebolt is a complete redesign of the data warehouse for the cloud era and data lakes, Farkash said. “Our aim is to enable organizations to deliver an incredible data analytics experience regardless of the size and usage patterns of a company’s data without having to constantly be worried about performance and costs.” Starburst Data, an analytics software provider, scored a $100 million Series C funding round in January, bringing its valuation to $1.2 billion. The 3-year-old, Boston-based company has raised more than $164 million in the last 12 months. The latest VC haul was led by Andreessen Horowitz with participation from Salesforce Ventures and existing investors Coatue and Index Ventures. Starburst was founded by CEO Justin Borgman, the former co-founder and CEO of Hadapt, after Hadapt was acquired by Teradata in 2014. However, Starburst has since shifted its focus from Hadoop to Presto, an open-source query project that serves as the engine behind the startup’s enterprise distribution platform. “Digital transformation has become an operational requirement. Organizations are relying on data-driven insights to develop a competitive advantage, reduce costs, and more quickly identify new opportunities,” Borgman said. “Unfortunately, even with millions of dollars invested in expensive data management tools, most organizations are still making decisions that are often too slow or based on incomplete, irrelevant data. Investors are taking note and backing solutions that help to solve that problem.” Borgman said the funding round will be used for product development and global expansion. Snowflake’s record-setting initial public offering (IPO) as the first cloud data warehouse to decouple storage and compute blew the roof off the New York Stock Exchange last September. The 8-year-old startup sold 28 million shares after raising its IPO price from $85 to $110 before settling on $120 per share. Snowflakes raised $3.4 billion at a valuation of $33 billion in its IPO. Snowflake provides data warehouse services using a cloud-based architecture offered through an as-a-service model. This allows enterprise customers to analyze data stored in a central repository using business intelligence tools or other analytics applications. Its platform supports the three largest public clouds — AWS, Microsoft Azure, and Google Cloud — however, it also competes against these platforms. “Snowflake disrupted the crowded, well-established data warehousing market by rapidly innovating a seamless service that tapped the scale, elasticity and economics of the public cloud,” said Clumio CEO Poojan Kumar in an emailed statement. Clumio provides cloud-based enterprise backup services, and Kumar is a self-described “strong admirer” of Snowflake. “Snowflake transformed the old-school data warehouse into a cloud data platform and completely changed the way companies do business today — and its customers have felt the biggest benefit with huge cost savings and productivity increases with faster access to their data,” Kumar added.

Archive