Browsers are
difficult
Please wait, loading your map...
Exploratory Data
Analysis
Visualization
Non
Visualization(Numerical)
Measure of Cental
Tendecy
Measure of Spread/
Dispersion
Skewness
Kurtosis
Percantiles
Z Score
Puropse: To find the
Middle values in Data
Distribution
1.Mean: Average of
numerical values
2.Median: Middle Most
Values
3.Mode: Most frequent
value
Mean = (Sum of all
values) / (Number of
values)
Keep all the values in
ascending order find
middle values
Check the value which
is most times repeated
Disadavantage:
Sensitive to Outliers
Purpose
Types
Univariate Plots
Bivariate Plots
Multivariate Plots
Continous
Categorical
1.Histogram
2.Density Plot
3.Boxplot
Bar Plot
Count plot
Heat map
4 Break down the data
5.Comparison
6.Distribution
7.Relationship
8.Trend
Variance
Standard Deviation
Range
measure of how spread
out or dispersed the
values in a data set
σ^2 = (Σ(xᵢ - μ)^2) / N
1.Sensitive to outliers
2.squared units
Standard Deviation (σ) =
sqrt(variance)
max -min
Quantifies the
asymmetry of a
probability distribution
or the shape of a data
set
γ₁ = (Σ((xᵢ - μ)³) / N) / (σ³)
If γ₁=0 Perfect symmetric
positive/right-
skewness/tail is on the
right
Negative/left-
skewness/tail is on the
left
The shape of a
probability distribution
of tail behavior of a
data set.
β₂ = (Σ((xᵢ - μ)⁴) / N) / (σ⁴)
Positive Kurtosis
heavier tails and a
higher peak
Negative Kurtosis
lighter tails and a
flatter peak
zero
normal distribution
R = (P / 100) * (n + 1)
z = (x - μ) / σ
1.Outlier detection
2.Normality testing
3.Standardization
to check Distribution
Frequency on Y-Axis
to check Distribution
probabilities on Y-Axis
To check outliers
to compare and display
the frequencies, counts,
or proportions
the number of
occurrences or counts
of unique values
Pairplot
display relationship
between multple
variables
quantifies and display
relationship between
multple variables
IQR
IQR = Q3 - Q1
Mean Absolute
Deviation
MAD = Σ(|Xi - X̄|) / N
Measure of Relationship
quantifies the
relationship between
two variables
Correlation
Pearson's correlation
coefficient
𝜌𝑋,𝑌=Cov(𝑋,𝑌)/
𝜎𝑋⋅𝜎𝑌
Spearman's rank
correlation coefficient
𝜌𝑠=1−6∑𝑑2𝑖/
𝑛(𝑛2−1)
ranges from -1 to +1
Covariance
Cov(X, Y) = Σ((Xi - X̄)(Yi -
Ȳ)) / (N - 1)
Disadvantages
1. Covariance does not Standardize the data
2. Covariance can only measures direction but not strength
1.Missing Values
2.Duplicates
3.Outliers
shape, center, spread,
and outliers.
4.line chart
trend
pie chart
percentiles or parts
continuous vs continuous
Scatter Plot
relationship between
two variables
Disadvantage: Strength
is subjective
Continuous vs
Categorical
Bar Graph
Stacked bar chart
Categorical vs
Categorical
Cross Tab
×
Created using
MindMup.com