I appeared on the FastBook Reading Sessions organised by Weights & Biases to discuss about the benefits of writing in Data Science. I wrote this piece to summarize what I covered there. This article was originally published on their forums but I’m sharing a somewhat edited version here as well. Primarily it discusses why writing matters in data science and how it can be used as a tool to leverage your portfolio.
Note: All images are by the author except where stated otherwise.
Many applications like information retrieval, personalization, document categorization, image processing, etc., rely on the computation of similarity or dissimilarity between items. Two items are considered to be similar if the distance between them is less and vice versa. So how do we calculate this distance? Well, each data object (item) can be thought of as an n-dimensional vector where the dimensions are the attributes (features) in the data. The vector representations thereby make it possible to compute the distance between pairs using the standard vector-based similarity measures like the…
Emotion Recognition is a common classification task. For instance, given a tweet, you create a model to classify the tweet as being either positive or negative. However, human emotions consist of myriad emotions and cannot be constrained to just these three categories. On the contrary, most of the datasets available for this purpose consist of only two polarities — positive, negative, and at times neutral.
However, recently, I came across a new dataset constructed from Twitter data that seems to fill this void. The dataset, aka emotion dataset…
In their paper, Tabular Data: Deep Learning is Not All You Need, the authors argue that while deep learning methods have shown tremendous success in the image and text domains, traditional tree-based methods like XGBoost still continue to shine when it comes to tabular data. The authors examined Tabnet, Neural Oblivious Decision Ensembles (NODE), DNF-Net, and 1D-CNN deep learning models and compared their performance on eleven datasets with XGBoost.
Every month I send out a newsletter that summarises my writing activity in the past month along with some interesting stuff that I encounter in the machine learning space. This month I am also sharing the newsletter here. If you are interested in the past editions or want to subscribe, the link is 👇
Welcome to the fourth edition of the newsletter. In this newsletter, we’ll look at a cool visualization library, get some Kaggle tips from a Taiwanese Grandmaster and look at ways to use Github more efficiently. There is also a guide to help newcomers avoid…
While going through the list of the articles that I have written to date, I discovered that quite a few were related to the concept of acquiring datasets for data science tasks. Some of those articles are targeted at finding good dataset websites, while others look at ways to create custom datasets. This article is a compilation of the various concepts covered in different articles. One can think of it as summarizing the multiple techniques while linking back to the original articles.
In these series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.
I recently got the chance to interview Kun-Hao Yeh — a Kaggle Competition’s Grandmaster and a Data Scientist at H2O.ai. Kun-Hao holds a Master’s degree in Computer Science from National Chiao-Tung University in Taiwan. His focus was on multi-armed bandit problems and reinforcement learning applied to computer games, including but not restricted to 2048…
This article compiles some useful tips and hacks that I have discovered over time while using Github. These have been gathered from various sources over time. I have filtered out the ones that were too familiar to avoid repetition. I’m sure you’ll find the list useful, and you might like to use them in your day-to-day work.
In 2020, there was a lot of furore on Twitter over the biased nature of their image cropping algorithm. The Twitterati complained that it was biased towards white-colored individuals and was objectifying women’s bodies. Twitter promised to look into this issue and several others to ensure responsible AI practices. This article summarizes the issues with Twitter’s Image Cropping algorithm, the findings of their research team, and how they intend to bring more transparency around their existing machine learning (ML) systems. The content in the article is based on the paper published by Twitter.