Adopting an architecture that meets specific user requirements during setup, you can guarantee optimal performance from your Amazon Redshift cluster. Let us take a look at some of the architectural choices that are available to manage workload and steer clear of outages.
(more…)
Apple Business Manager—The New Path for Distributing iOS Apps In-House
In 2020, when a surging pandemic and safety protocols shuttered many offices, we were among the IT firms to switch entirely to remote work. We adapted to its rhythms almost instantly, logging into work like clockwork, collabing over Zoom and Meet, and diligently meeting deadlines and release dates. Access to internal systems and data was something to be ironed out. An in-house mobile application became the need of the hour. Building the iOS app was no sweat but the distribution path wasn’t so clear-cut at first.
I’ll soon explain how we sorted this out.
(more…)
Public Key Info Hash SSL Pinning in Swift Using TrustKit
This post is for anyone who’s trying to implement Public Key Info Hash SSL Pinning in iOS using TrustKit. The process is very straightforward except when you goof up by missing a tiny detail. A lot of documentation is already available on this topic. I’m just bringing the whole process under one roof.
(more…)
Building Covid-19 Twitter Data Aggregation Platform with PySpark CLI
In a previous blog post, we explained how you can use PySpark CLI to jumpstart your PySpark projects. Here, I’ll explain how it can be used to build an end-to-end real-time streaming application.
(more…)
Automating Insurance Claim Adjudication
Claim adjudication, the process of determining the financial liability of a claim by the insurance company, is quite complex and time-consuming. Adjudication can be quick if the received claim is clear to the dot, in the sense that all the information is accurate and the claim is within the limits of the policy. But, as with all things in life, this is never the case.
(more…)
PySpark CLI—An Efficient Way to Manage Your PySpark Projects
In the world of big data analytics, PySpark, the Python API for Apache Spark, has a lot of traction because of its rapid development possibilities. Apart from Python, it provides high-level APIs in Java, Scala, and R. Despite the simplicity of the Python interface, creating a new PySpark project involves the execution of long commands. Take for example the command to create a new project:
$SPARK_HOME/bin/spark-submit \ --master local[*] \ --packages 'com.somesparkjar.dependency:1.0.0' \ --py-files packages.zip \ --files configs/etl_config.json \ jobs/etl_job.py
It is NOT the most convenient or intuitive method to create a simple file structure.
So is there an easy way to get started with PySpark?
(more…)
Ease the “Full-Stack Pressure” with Laravel Livewire
In the contemporary web development world, being a full-stack developer is the new norm. You may be a backend or front end developer, but you are expected to know technologies outside your field of specialization. Talk about full-stack pressure! I was a PHP developer when I felt this pressure for the first time. There is no question that JavaScript is a great language. But, for me, who was more into PHP or server-side, it was a bit overwhelming. That scene has changed with the arrival of Laravel Livewire.
(more…)
Comparative Analysis of ML Models for Fraud Detection
A large variety of fraud patterns combined with insufficient data on fraud makes insurance fraud detection a very challenging problem. Many algorithms are available today to classify fraudulent and genuine claims. To understand the various classification algorithms applied in fraud detection, I did a comparison using vehicle insurance claims data.
(more…)
Generating Malayalam Word Embeddings: A Case Study
Research shows that children primarily learn languages by observing patterns in the words they hear. Computer scientists are taking a similar approach to train computers to process human language.

NLP Libraries for Malayalam Sentence Tokenization: An Exploratory Study
Imagine that you are working on machine translation or a similar Natural Language Processing (NLP) problem. Can you process the corpus as a whole? No. You will have to break it into sentences first and then into words. This process of splitting input corpus into smaller subunits is known as tokenization. The resulting units are tokens. For instance, when paragraphs are split into sentences, each sentence is a token. This is a fairly straightforward process in English but not so in Malayalam (and some other Indic languages).
(more…)