EDA: Exploratory Data Analysis

EDA stands for Exploratory Data Analysis and is one of the most important and time consuming processes in data analysis.

The goal is to understand the main characteristics of a data set, by making different tables and plots. Visual plots helps the data scientist in making choices what to do with the data and choosing the right machine learning model. Therefore, it sometimes is necessary to prepare the data before feeding it to the ML algorithm.

Many different plots can be made. Some of the most used plots are:

  1. scatter plot
  2. box plot
  3. histogram
  4. etc.

Jupyter Notebook

A Jupyter Notebook on Anaconda or on a Google Colab is ideal for prototyping and performing an EDA on raw data.