Businesses don't get any points for how efficiently their infrastructure runs or how high they can stack all the Big Data they collect. What does count is the quality of the analytics and intelligence that data produces.
Over the last several years, Hadoop is the word that's become most synonymous with ingesting, processing, and transforming data. This open-source framework for distributed data storage and processing has spawned its own enterprise space and integrated its way into all the major cloud platforms. Hadoop is far from the only Big Data technology worth talking about, but it's become the one on which many others are built.
The problem for businesses is the Hadoop space is full of distributions and tooling options, and as Gartner Research Director Nick Heudecker explained, many of them look the same. Heudecker, whose research covers information management including the Big Data and NoSQL spaces, said if you're looking at the general data processing options, a lot of vendors offer very similar features.
Breaking Down the Market
There are three main pure-play Hadoop start-ups—Cloudera, Hortonworks, and MapR—and they've all grown steadily in 2015. According to Gartner, each has approximately 700 customers, give or take 10 percent, putting the global market between 2,100-2,400 Hadoop customers worldwide. All three offer both a free tier and an enterprise tier of their Hadoop distribution, and each makes significant open-source contributions to projects under the Apache Software Foundation (ASF) banner.
"Our data indicates that 44 percent of Hadoop use is currently unpaid," said Heudecker. "Is there a clear leader? I don't think so. They're all grabbing market share because it's a very new space."
In the last few months, much of the competition between the three has come down to competition over data analytics capabilities and creative ways of integrating Apache Spark, an open-source Big Data processing engine with use cases from real-time data streams to machine learning. MapR recently announced MapR Streams as part of a "converged data platform" integrating Hadoop, Spark-based stream processing, and analytics. Hortonworks rolled out an update to the Hortonworks Data Platform (HDP) with in-memory Spark analytics, and Cloudera offers a variety of open-source Spark integrations through its One Platform Initiative, along with offering Spark training classes.
"There's a lot happening in the information management and information infrastructure spaces, and it's not all Hadoop," Heudecker explained. "There's tremendous momentum behind Spark's speed and memory-centric data processing model, though Spark's development is still in its early stages. Spark will be another lingua franca in data processing, much like SQL today, and is definitely showing signs that it has some legs as more and more companies invest in it."
Heudecker also highlighted the importance of the cloud players in Big Data; the tech giants that have integrated Hadoop and other Big Data technologies into their existing Infrastructure-as-a-Service (IaaS) offerings.
Amazon Web Services (AWS) uses its Amazon Elastic MapReduce (EMR) service for cloud-based Hadoop orchestration. Microsoft offers a whole host of Big Data services within its Azure cloud platform, partnering with Hortonworks on its HDInsight service for managing Apache Hadoop, Spark, HBase, and Storm, along with its SQL-based Azure Data Lake and Azure Data Analytics. IBM has both its on-premises IBM Open Platform offering for Hadoop and IBM BigInsights, an analytics package to run on top of it, along with managed Hadoop and Apache Spark-as-a-service in its Bluemix cloud. The list goes on, and businesses find the more applicable use cases in the cloud.
"We estimate that AWS alone has about 5,000 customers, so that's over twice the customer base of the pure-plays combined," said Heudecker. "One of the advantages of moving into the cloud is that you get an ecosystem. You can get the pure-play Hadoop distributions on any of the IaaS offerings. MapR is available in all the clouds you can think of, other than IBM's; same for Cloudera and Hortonworks. We haven't seen cloud availability become too much of a factor when choosing between one vendor and another."
Choosing an Enterprise Data Strategy
For both small to midsize businesses (SMBs) and growing enterprise businesses, when investing in data processing and analytics solutions, Heudecker said the deciding factor is which platform can provide the highest level of service. The biggest challenge for businesses, according to Gartner, is the skills gap—figuring out who's going to manage the platform once it's installed and deployed.
"If companies are looking for a data platform partner, who's going to help them with data ingest? Who's going to help them build the analytical application? As far as the three pure-play Hadoop-ers, the evaluation criteria tend to be around the maturity of the management tools and consoles, the data governance tools, and the performance."
The other interesting aspect of choosing a Hadoop platform is a lack of loyalty. Companies re-evaluate their Hadoop platform as often as every 6-12 months to see if the data processing components are still the right fit, because of how rapidly the space is changing and how little the big players have differentiated themselves. Heudecker said 20 percent of the companies he's talked to have multiple Hadoop distributions running in their data centers or cloud, either letting different teams pick their platform of choice or diversifying intentionally to avoid getting stuck with only one Hadoop distribution.
This kind of diversified platform portfolio feeds into what Frank Buytendijk, a Gartner Research Vice President and Distinguished Analyst focusing on digital strategy, calls "information as an asset." Like you can't run a business without capital, labor, materials, and either physical or virtual facilities, Buytendijk said you can't run a business without information.
"We used to look at business in terms of the three flows: the primary flow was goods, the secondary flow being money, and tertiary flow was information to make sure the goods and money aligned. Now in most businesses it is the other way around. The primary flow is information, from identification and configuration to content marketing, etc. Whether you call that Big Data or not doesn't really matter."
"Big Data" Is Outdated
Buytendjik said he doesn't see Big Data as a separate technology for businesses, but as one theme or mindset within your overall digital strategy.
"I don't believe in having a Big Data strategy," said Buytendjik. "There is hardly a business strategy anymore without digital components, so I believe in having a digital strategy in which all kinds of technologies deliver critical capabilities. This includes mobile, social, cloud, IoT, smart machines, and Big Data."
Heudecker believes we'll start talking about "Big Data" less and less, because now it's just data. It's the way business is done. Massive volumes and high velocity of data are no longer as daunting.
"Big Data is becoming subsumed once again by information and analytics," said Heudecker. "The Big Data category is frankly not differentiating. We always get asked the size of the Big Data market, but what does that even mean? Big Data is not really a market, it's a concept. For a business, thinking about Big Data as something unique and special that's radically different than what you've done before is a mistake. At this point, data is just normal."
0 comments:
Post a Comment