Writing in Data Science can have a transformative effect not only in your journey but also in your career.

Photo by on

I appeared on the organised by to discuss about the benefits of writing in Data Science. I wrote this piece to summarize what I covered there. This article was on their forums but I’m sharing a somewhat edited version here as well. Primarily it discusses why writing matters in data science and how it can be used as a tool to leverage your portfolio.

The best way to learn any concept, especially in data science, is by writing about it. It helps you understand the topic in detail, and your work might…


Read data directly from the clipboard without saving it first

Image by Author

When I write about a library or a new concept, I typically like to showcase its working via examples. The source of datasets that I use in my articles varies widely. Sometimes I create simple toy datasets, while on other occasions, I go with the established dataset sites like and . However, every time I need to showcase a concept, I have to go through the laborious work of copying the data from the source, saving it to my system, and finally using it in my development environment. Imagine my surprise when I discovered a built-in method in…


Speed up Inference of your scikit-learn models

Photo by on

Deep learning frameworks consist of tensors as their basic computational unit. As a result, they can utilize the hardware accelerators (e.g., GPUs), thereby speeding up the model training and inference. However, the traditional machine learning libraries like are developed to run on CPUs and have no notion of tensors. As a result, they cannot take advantage of GPUs and hence miss out on the potential accelerations that deep learning libraries enjoy.

In this article, we’ll learn about a library called Hummingbird, created to bridge this gap. Hummingbird speedups up the inferencing in traditional machine learning models by converting them…


Make your heatmaps stand out

Another Brick In The Wall | Image by Author

Github contribution graph shows your repository contributions over the past year. A filled-up contribution graph is not only pleasing to the eye but points towards your hard work, too(unless if you have hacked it). The graph, though pretty, also displays considerable information regarding your performance. However, if you look closely, it is just a heatmap displaying some time series data. Therefore, as a weekend activity, I tried to replicate the graph for some basic time-series data, sharing the process with you through this article.

Dataset and some preprocessing

The dataset that I’m going to use in this article comes from the Tabular Playground Series(TPS)…


In conversation with Dmitry Gordeev: A Data Scientist and a Kaggle Competition Grandmaster

Image by Author

In these series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at , who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.

In this interview, I shall be sharing my interaction with Dmitry Gordeev, also known as in Kaggle world. He is a Kaggle Competition’s Grandmaster and a Senior Data Scientist at . Dmitry studied at Moscow State University and graduated as a specialist in applied math/data mining. …


An overview and a tour of the course content

Time for another course | Image by Author

Massive Open Online Courses (MOOCs) are an indispensable part of the life of a self-taught data scientist. If you are in a room full of wanna-be data scientists, the chances are that fifty percent of them have taken the famous . However, here is the twist. Even though many of us get enrolled in various online courses, only a handful complete them. In fact, a study titled , claims that the completion and retention rates of online courses are minimal. …


A whirlwind tour of five libraries that could be a great addition to your Data Science stack

Photo by on

Open-source is the backbone of machine learning. They go hand in hand. The rapid advancements in this field wouldn’t have been possible without the contribution of the open-source fraternity. Many of the widely used tools in the machine learning community are open source. Every year more and more libraries get added to this ecosystem. In this article, I present a quick tour of some of the libraries that I recently encountered and which could be a great supplement to your machine learning stack.

1️⃣. HummingBird

Humminbird is a library for compiling trained traditional machine learning models into tensor computations. This means you…


Important caveats to be kept in mind when encoding data with pandas.get_dummies()

Handling categorical variables forms an essential component of a machine learning pipeline. While machine learning algorithms can naturally handle the numerical variables, the same is not valid for their categorical counterparts. Although there are algorithms like and that can inherently handle the categorical variables, it is not the case with most other algorithms. These categorical variables have to be first converted into numerical quantities to be fed into the machine learning algorithms. There are many ways to encode categorical variables like one-hot encoding, ordinal encoding, label encoding, etc. …


Hands-on Tutorials

Streamline your data science code repository and tooling quickly and efficiently

Free Vector illustrations from

Good Code is its own best documentation

, in one of her , highlighted the importance of code reproducibility in a very subtle way :

“Why should you care about reproducibility? Because the person most likely to need to reproduce your work… is you.”

This is true on so many levels. Have you ever found yourself in a situation where it became difficult to decipher your codebase? Do you often end up with multiple files like untitled1.py or untitled2.ipynb? Well, if not all, a few of us must have undoubtedly faced the brunt of bad coding practices on…


Building interpretable Boosting Models with IntepretML

Image by from

As summed up by , interpretability refers to the degree to which a human can understand the cause of a decision. A common notion in the machine learning community is that a trade-off exists between accuracy and interpretability. This means that the learning methods that are more accurate offer less interpretability and vice versa. However, of late, there has been a lot of emphasis on creating inherently interpretable models and doing away from their black box counterparts. In fact, Cynthia Rudin argues that that deeply impact human lives. …

Parul Pandey

Data Science @H2O.ai | Editor @wicds

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store