The Forrester Wave™: Streaming Analytics, Q3 2017: Gathr named a Strong Performer

“Impetus has the opportunity to make StreamAnalytix (now known as Gathr) the de facto tooling standard for Spark and future streaming engines”- States the report.

Forrester has positioned Gathr as a strong performer among 13 most significant streaming analytics providers in The Forrester Wave™: Streaming Analytics, Q3 2017; one of the few, and perhaps the most comprehensively evaluated market insight reports focussed on streaming analytics today.

Gathr is a platform that makes creating real-time stream processing and machine learning applications on Apache Spark extremely easy. It is aimed at enterprises looking for a single visual platform that leverages popular open source, big data platforms for streaming ETL and advanced analytics, which is easy to use for both business and technical users.

What also shines about Gathr solution is that it includes enterprise-grade visual tooling for both development and deployment of streaming applications.

–An excerpt from The Forrester Wave™: Streaming Analytics, Q3 2017

Forrester’s evaluation of Gathr resonates with our key focus areas; the report’s vendor profile section highlights the following aspects of Gathr:

  • Gathr offers use of both Apache Storm and Apache Spark and is architecturally positioned to support future use of other open-source streaming engines such as Apache Flink
  • Gathr embeds EsperTech to provide advanced streaming analytics capabilities such as complex event processing
  • Gathr tooling also unifies streaming and batch by supporting arbitrary Spark jobs such as machine learning

For more information on Gathr coverage in the Forrester Wave, read the full press release.

Aragon Research 2017: Gathr one of the four hot vendors in streaming analytics

Every time an independent research firm identifies Gathr (earlier known as StreamAnalytix) as a leading platform in the increasingly competitive space of streaming analytics, it is an exciting moment for us. Inclusion in a recently published report by Aragon Research, a technology focused research and advisory firm, as one of the ‘Hot Vendors in Streaming analytics 2017’ is one such proud moment.

The fact that we are one of only four players covered in the report, makes it even more exciting. With this report, Aragon Research provides insight on new and noteworthy data management and streaming analytics providers. Each year, Aragon Research recognizes Hot Vendors across multiple markets that are doing something new or differently. They may have new technology that expands capabilities, a new strategy that opens up markets, or just a new way of doing business that makes them worth assessing.

The report validates our focus on the use of open source big data technologies such as Spark Streaming and Apache Storm for real- time data insight, and recommends evaluating Gathr to enterprises that need a single visual platform that leverages popular open source, big data platforms for streaming ETL and advanced analytics, and that is easy-to-use for business and technical users.

Streaming data represents new avenues for creating value, and enterprises are beginning to pay attention to this new source of competitive advantage.  The value is driven by new business insights from sensor data, web clickstreams, geolocation data, weather reports, market data, social media and other event streams. Often it is the combination of multiple streaming and static sources of data that reveals new powerful insights. However, the successful use of stream processing engines such as Apache Spark ™ to build such advanced analytical applications can be a challenge as it typically requires deep technical and data science skills.

Solution? Gathr ! A platform that makes creating real-time stream processing and machine learning applications on Apache Spark extremely easy. It now offers a Visual Spark Studio for development and life-cycle management of Apache Spark applications in both streaming and batch mode. Earlier this year, within a short span of six weeks, engineers who were even new to Apache Spark were able to build complex machine learning applications for anomaly detection leveraging the Gathr platform – as part of a contest that we had organised.

Back to the topic, and in closing…we feel very thankful and immensely encouraged by the Aragon Research recognition as a ‘Hot Vendor’ and the validation of the benefit we strive to bring to enterprises i.e. “powerful tooling and ease of use – over open source big data and fast data technologies”. For more information, read the full press release.

Structured Streaming, Simplifying Building Stream Analytics Applications

Last week the Gathr team hosted a webinar on Structured Streaming, “The Structured Streaming Upgrade to Apache Spark and How Enterprises Can Benefit” and received overwhelming participation from the industry, including many of you reading this. Amit Assudani (Sr. Technical Architect – Spark, Gathr) and I took a deep dive into Structured Streaming and shared our views on how it enables the real-time enterprise and is simplifying building stream processing applications on Spark. Here is a summary of our current view on Structured Streaming-

Using Zeppelin to Build Data Science Models for Gathr

Data scientists use different applications like R, Python or Scala (with notebook tool like Apache Zeppelin) to develop data science models. For example, some prefer R to create their models, others like to write code for their models in languages like Python or Scala using notebook tools like Apache Zeppelin and so on.

Gathr, a real-time streaming analytics platform, allows users to build and deploy data models by using different tools like PMML, Scala, pyspark. This streaming analytics platform supports multiple languages and formats, enabling users to create the code in their preferred technology.  Once the model is prepared, it can be deployed on Gathr to run and perform scoring over the data in a distributed fashion.

This article explains how users can create a data model in Apache Zeppelin notebook and use it with the Gathr platform. It also demonstrates how to use pyspark library to build a SVM classifier on Zeppeling and use it on the Gathr.

Spark Streaming Contest: Real-time Anomaly Detection Apps

At Impetus, we take data analytics innovation seriously. Very seriously. And one of the ways we continue to improve our big data software products and services, as well as retain our industry leadership standard, is through community programs that empower users to explore innovative uses for analytics technologies with our real-time streaming software, Gathr.

One of our big data programs was the inaugural Spark Streaming Innovation Contest, an international data hackathon that drew roughly 600 participants from around the world with a grand prize of $10,000 for the best submission. Held from February through April, we opened the contest to the general community, calling on business analysts and engineers to solve real-world anomaly detection problems.

Because hackathon participants vary in skill level and experience, we outfitted them with two tools. Apache Spark and Gathr. We wanted them to be able to access their data quickly while eliminating the need to build complicated models to gain insights.

Apache(R) Spark ™ is the most popular stream processing engine due to its open source framework, powerful programming model, and advanced analytics capabilities. However, Spark typically requires a lot of setup, coding and modeling; therefore, we equipped users with Gathr, a development platform that enables users to create real-time stream processing and machine learning applications.

Gathr makes anomaly detection on Apache Spark extremely easy, allowing developers to leverage their data quickly and spend their time gaining insights instead of programming. With these tools in hand, hackathon participants could build anomaly detection applications quickly, even without prior experience of using Gathr.

A panel of experts, including the Gathr product team, architects and engineers, as well as Alex Woodie, managing editor of Datanami and Mike Matchett, senior analyst and consultant at Taneja Group, evaluated and scored each submission.

Perhaps one of the most shocking discoveries we made is that this year’s winners weren’t even veteran data scientists. “I wouldn’t call myself a data science expert,” said Venu Kanaparthy of Redlands, California. Kanaparthy won the grand prize of $10,000 with his machine learning application for anomaly detection using Spark. Despite his limited experience, he says that he “was able to build a fully functional anomaly detection application on Spark working part-time evenings over about 4 weeks.”

A total of $18,000 was awarded in prize money, including two runners-up. The First runner-up (awarded $5,000) was Anindya Saha from Foster City, California. The second runner-up (awarded $3,000) was Kalyan Janaki from Denver, Colorado. We congratulate our winners and are already looking forward to next year’s competition.

Using Gathr to Calculate the Conversion Rate of a Website

Websites today are the cornerstone to drive business objectives and achieve revenue goals. Hence, business owners need to ask themselves the following questions:

  • Who are my potential customers?
  • What is their pattern of purchase?
  • How can I improve my website to increase business?

Gathr is an excellent platform for performing analytics on the web for any live clickstream data. You can track performance metrics of websites in many ways. One such measure is the relative conversion rate. Let’s take a look at what it means, how it’s derived and why it’s important.

Streaming Big Data ETL with Impetus Gathr and Syncsort DMX – Guest Blog

Streaming Big Data ETL with Impetus Gathr and Syncsort

Today we are announcing a partnership between Syncsort and Impetus Technologies, and our entry into an integration of batch processing and real-time stream processing that we call “Streaming ETL”. The mix of batch and real-time processing has also been referred to as the Lambda Architecture. Streaming ETL allows a mixing of the best batch and streaming technologies under the umbrella of tools which abstract the complexity of the underlying platforms.

The huge increase in types and sources of data has placed pressure on companies to blend and summarize that data quickly to create actionable information. A combination of real time and batch processing is needed to meet the new demands.

There’s a grab bag of technologies that excel in specific aspects: Hadoop Mapreduce, Storm and Spark for massively parallel processing; Kafka and Spark Streaming along with traditional messaging and queuing software for real time data movement; Mesos and YARN for cluster management. These components can be mixed and matched, but there are many APIs to learn and different skill sets needed to leverage them well.

Gathr definition with Syncsort DMX bolt

DMX task that performs lookup for IP