Missing Data can occur when no information is provided for one or more items or for a whole. In real world missing data is a big problem. It refer to as NA(Not Available) values in pandas. In DataFrame sometimes many datasets simply arrive with missing data, either because it exists and was not collected or it never existed. For Example, Suppose different user being surveyed may choose not to share their address, some user may choose not to share the address in this way many datasets went missing.


We are going to use dataset Titanic — Machine Learning from Disaster

Pandas is a Python library that provides extensive means for data analysis. Data scientists often work with data stored in table formats like .csv, .tsv, or .xlsx. Pandas makes it very convenient to load, process, and analyze such tabular data using SQL-like queries. In conjunction with Matplotlib and Seaborn, Pandas provides a wide range of opportunities for visual analysis of tabular data.

The main data structures in Pandas are implemented with Series and DataFrame classes. DataFrames are great for representing real data:

  • rows correspond to instances (examples, observations, etc.)
  • columns correspond to features of these instances.

Now, let’s read the…

Matplotlib is a Python library used for plotting. Plots enable us to visualize data in graphical representation.

Matplotlib is a widely used Python based library; it is used to create 2d Plots and graphs. It plays an important role in visualizing and serve as an important part for an Exploratory Data Analysis (EDA) step.

Types of Plots in Matplotlib Python:

  • Scatter Plot
  • Histograms
  • Line Charts
  • Bar Chart
  • Pie Chart

Import libraries and Load Dataset.

We use Titanic and Iris Datasets for visualization. In order to read CSV file we need to import pandas library in python and matplotlib for plotting.

import pandas as pd import matplotlib.pyplot…

The house price prediction competition is a amazing place to start.

House Prices: Advanced Regression Techniques

This Blog is for reference for anyone who want to start with Kaggle competition (beginner friendly).


  • Python3 (3.5 or 3.6 recommended)
  • jupyter
  • Packages (pandas, numpy, matplotlib, seaborn, scikit-learn.)

For Data Cleaning we should understand our Data set.

Data Fields:

Here’s a brief version of what you’ll find in the data description file.

  • SalePrice: The property’s sale price in dollars. (This is the target variable that you’re trying to predict).
  • MSSubClass: The building class.
  • MSZoning: The general zoning classification.
  • LotFrontage: Linear feet of street connected to property.
  • LotArea: Lot size in square…

Iqra Naeem

Machine Learning | Data Science | Web Development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store