Browsers are difficult Please wait, loading your map...
Exploratory DataAnalysisVisualizationNonVisualization(Numerical)Measure of CentalTendecyMeasure of Spread/DispersionSkewnessKurtosisPercantilesZ ScorePuropse: To find theMiddle values in DataDistribution1.Mean: Average ofnumerical values2.Median: Middle MostValues3.Mode: Most frequentvalueMean = (Sum of allvalues) / (Number ofvalues)Keep all the values inascending order findmiddle valuesCheck the value whichis most times repeatedDisadavantage:Sensitive to OutliersPurposeTypesUnivariate PlotsBivariate PlotsMultivariate PlotsContinousCategorical1.Histogram2.Density Plot3.BoxplotBar PlotCount plotHeat map4 Break down the data5.Comparison6.Distribution7.Relationship8.TrendVarianceStandard DeviationRangemeasure of how spreadout or dispersed thevalues in a data setσ^2 = (Σ(xᵢ - μ)^2) / N1.Sensitive to outliers2.squared unitsStandard Deviation (σ) =sqrt(variance)max -minQuantifies theasymmetry of aprobability distributionor the shape of a datasetγ₁ = (Σ((xᵢ - μ)³) / N) / (σ³)If γ₁=0 Perfect symmetricpositive/right-skewness/tail is on therightNegative/left-skewness/tail is on theleftThe shape of aprobability distributionof tail behavior of adata set.β₂ = (Σ((xᵢ - μ)⁴) / N) / (σ⁴)Positive Kurtosisheavier tails and ahigher peakNegative Kurtosislighter tails and aflatter peakzeronormal distributionR = (P / 100) * (n + 1)z = (x - μ) / σ1.Outlier detection2.Normality testing3.Standardizationto check DistributionFrequency on Y-Axisto check Distributionprobabilities on Y-AxisTo check outliersto compare and displaythe frequencies, counts,or proportionsthe number ofoccurrences or countsof unique valuesPairplotdisplay relationshipbetween multplevariablesquantifies and displayrelationship betweenmultple variablesIQRIQR = Q3 - Q1Mean AbsoluteDeviationMAD = Σ(|Xi - X̄|) / NMeasure of Relationshipquantifies therelationship betweentwo variablesCorrelationPearson's correlationcoefficient𝜌𝑋,𝑌=Cov(𝑋,𝑌)/𝜎𝑋⋅𝜎𝑌Spearman's rankcorrelation coefficient𝜌𝑠=1−6∑𝑑2𝑖/𝑛(𝑛2−1)ranges from -1 to +1CovarianceCov(X, Y) = Σ((Xi - X̄)(Yi -Ȳ)) / (N - 1)Disadvantages1. Covariance does not Standardize the data2. Covariance can only measures direction but not strength1.Missing Values2.Duplicates3.Outliersshape, center, spread,and outliers.4.line charttrendpie chartpercentiles or partscontinuous vs continuousScatter Plotrelationship betweentwo variablesDisadvantage: Strengthis subjectiveContinuous vsCategoricalBar GraphStacked bar chartCategorical vsCategoricalCross Tab

Created using MindMup.com