Extracting Data from Common Crawl Dataset

Extracting Data from Common Crawl Dataset

In the field of natural language processing (NLP), data is king. The more data you have, the better your results. Most new research is freely accessible these days and, thanks to the cloud, there is unlimited computing power at our disposal. What keeps an NLP researcher from achieving state-of-the-art results despite this is the lack of good data.

(more…)
Enforcing social distancing in shops using YOLO and OpenCV

Enforcing Social Distancing in Shops Using YOLO and OpenCV

Small businesses thrive on in-store customers. When they reopen post-lockdown, a major challenge would be ensuring the safety of their staff and customers. Sanitizing and limiting shop occupancy are important safety measures but so is social distancing. How can small shops, with their limited resources, monitor their customers and enforce social distancing?

Object detection in real-time is a potential solution. 

(more…)
Malayalam Subword Tokenizer

Malayalam Subword Tokenizer

Let’s start with the obvious question, what is a tokenizer? A tokenizer in Natural Language Processing (NLP) is a text preprocessing step where the text is split into tokens. Tokens can be sentences, words, or any other unit that makes up a text. 

Every NLP package has a word tokenizer implemented in it. But there is a certain challenge associated with Malayalam tokenization.

(more…)
Natural Language Processing—The Emerging Force Within the AI Family

NLP—The Emerging Force within the AI Family

Natural Language Processing (NLP) is a field within Artificial Intelligence (AI) that allows machines to parse, understand, and generate human language. This branch of AI can be applied to multiple languages and across many different formats (for example, unstructured documents, audio, etc.).

Considering that the NLP market is anticipated to be worth $13.4 billion in 2020, it is worth delving deeper into this field of AI.

This article seeks to explain first how NLP works, followed by how it is used, and what the future looks like for this exciting area of AI.

(more…)
OCR: Extracting Printed Text from Scanned Documents - Part 1

OCR: Extracting Printed Text from Scanned Documents

One of our clients in the banking sector recently came up with a request (or challenge, rather). 

While it is true that digitalization has brought a world of difference to banking, we are still nowhere near paperless banking. Regulations require banks to collect different types of documents from customers at the time of onboarding and for various other services. 

On an average, a single branch has to process at least hundreds of these documents on a daily basis. Automating this workflow would save the bank plenty of time and labor. The client wanted to build an Optical Character Recognition (OCR) solution that could be seamlessly integrated into the existing banking software. 

(more…)
Building a food image detector

Building a Food Image Detector

The field of image processing has progressed so much in the recent past that most of our customer inquiries for camera-based mobile applications now include some kind of image processing. One of them was to build a food image detection module to detect multiple food items from a mobile camera image and mark their positions. The following post details the various stages of our research and how we eventually built a highly accurate food image detector.

(more…)
AI's Journey - Featured Image of Blog Post by Simon Chambers for QBurst

AI’s Journey

Artificial Intelligence (AI) is talked about everywhere, from its admirable role in cancer diagnosis to scaremongering about robots taking over jobs. Where did AI come from though, and how did it creep up on us all of a sudden?

With a slant towards the UK and Europe, this article attempts to give a holistic view of AI’s journey and enlighten on its breadth and scale. It also intends to dispel some myths, raise awareness about aspects that surround AI, and suggest areas for consideration in the future. 

(more…)
Top Text-to-Speech Software with Natural Voice

Top 4 Text-to-Speech Software with Natural Voices

Generating natural speech remains the holy grail of text-to-speech (TTS) synthesis systems. While intelligible TTS systems are well-developed, the quest for the perfect-sounding one is still on. Remember Duplex that debuted at Google I/O 2018? This AI-powered Google Assistant famously dialed up a salon, negotiated an appointment with the staff in an intelligent back-and-forth communication, without giving away the fact that it was a machine.

(more…)