Orchestrating Spark Jobs with Azure Data Factory

Processing Big Data Part II: Orchestrating Spark Jobs with Azure Data Factory

In Part 1 of this series, we learnt how to set up a Hadoop cluster on Azure HDInsight and run a Spark job to process huge volumes of data. In most practical scenarios, however, such jobs are executed as part of an orchestrated process or workflow unless the need is for a one-time processing. In our specific use case, we had to derive different metrics related to error patterns and usage scenarios from the log data and report them on a daily basis.

(more…)

Processing Big Data with Azure HDInsight and Spark - Part I

Processing Big Data with Azure HDInsight and Spark: Part I

Before we delve into the interesting part, let me set the context first. The problem we had in hand was to do some data crunching on the log data for one of our client applications, to analyze and report on the various client-defined metrics from the application logs. The application under consideration had a user base of more than 100K users, which meant millions of rows of data to process on a daily basis. Clearly, we were dealing with “big data.” Considering the volume of data involved, we decided to go with Spark running on an Azure HDInsight cluster to benefit from the increased performance offered by Spark’s in-memory RDDs (Resilient Distributed Datasets).

(more…)

Load Testing SignalR Hub-Based Applications - Part 2

Load Testing SignalR Hub-Based Applications – Part 2

In the first part of this series, we saw how to load test a SignalR Hub-based application using our custom SignalR load test component.

Our test objective was to find the optimum server configuration that would be capable of hosting our SignalR Hub-based application, which can support a load of up to 2000 concurrent connections. We started off with an Azure VM instance with the following configuration for hosting the application: (more…)

Load Testing SignalR Hub-Based Applications - Part 1

Load Testing SignalR Hub-Based Applications – Part I

Recently, we came across a requirement where the design scenario was to support a large number of connected client apps that needed to maintain a continuous connection with the server. This would facilitate the ability to provide a server dashboard capable of doing a variety of operations, such as taking specific clients offline and bringing them back online later to manage the client applications remotely.

A WebSockets-based implementation using SignalR came as a natural choice for the technology given its versatility and flexibility along with its suitability for such connected application scenarios. However, there were a few hurdles to be crossed before proceeding with the implementation. (more…)