Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Descriptive Statistics in R: Analyzing Car and Iris Datasets, Exercises of Library science

An overview of how to perform descriptive statistics analysis using the r programming language. It covers the key measures of central tendency (mean, median, mode) and variability (range, variance, standard deviation) that are commonly used to summarize and understand data. The document demonstrates the application of these techniques on two popular datasets - the 'mtcars' and 'iris' datasets - using r functions such as summary(), str(), and aggregate(). The goal is to help students and researchers gain a better understanding of their data through descriptive analysis, which is a crucial first step in many machine learning and data science workflows. By studying this document, readers will learn how to extract meaningful insights from small to medium-sized datasets, laying the foundation for more advanced data analysis and modeling.

Typology: Exercises

2019/2020

Uploaded on 11/12/2022

amritanshu-diwakar
amritanshu-diwakar 🇮🇳

1 document

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
G. H. RAISONI COLLEGE OF ENGG., NAGPUR
(An Autonomous Institute under UGC Act 1956)
Department of Computer Science & Engg.
Practical Subject: Data Analytics with R
Student Details:
Roll Number 28
Name Amritanshu Rajeev Diwakar
Semester 4
Section A
Batch A2
Practical Details: Practical Number-2
Practical
Aim
DESCRIPTIVE STATISTICS IN R
a. Write an R script to find basic descriptive statistics using summary, str,
quartile function on mtcars & cars datasets.
b. Write an R script to find subset of dataset by using subset (), aggregate ()
functions on iris dataset
Theory In Descriptive analysis, we are describing our data with the help of various
representative methods like using charts, graphs, tables, excel files, etc. In the
descriptive analysis, we describe our data in some manner and present it in a
meaningful way so that it can be easily understood.
Most of the time it is performed on small data sets and this analysis helps us a
lot to predict some future trends based on the current findings. Some
measures that are used to describe a data set are measures of central tendency
and measures of variability or dispersion.1
Process of Descriptive Analysis
The measure of central tendency:
Measure of variability
Measure of central tendency
It represents the whole set of data by a single value. It gives us the location of
central points. There are three main measures of central tendency:1
Mean
Mode
Median
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Descriptive Statistics in R: Analyzing Car and Iris Datasets and more Exercises Library science in PDF only on Docsity!

G. H. RAISONI COLLEGE OF ENGG., NAGPUR

(An Autonomous Institute under UGC Act 1956)

Department of Computer Science & Engg.

Practical Subject: Data Analytics with R

Student Details:

Roll Number 28

Name Amritanshu Rajeev Diwakar

Semester 4

Section A

Batch A

Practical Details: Practical Number-

Practical

Aim

DESCRIPTIVE STATISTICS IN R

a. Write an R script to find basic descriptive statistics using summary, str, quartile function on mtcars & cars datasets. b. Write an R script to find subset of dataset by using subset (), aggregate () functions on iris dataset

Theory In Descriptive analysis, we are describing our data with the help of various

representative methods like using charts, graphs, tables, excel files, etc. In the descriptive analysis, we describe our data in some manner and present it in a meaningful way so that it can be easily understood. Most of the time it is performed on small data sets and this analysis helps us a lot to predict some future trends based on the current findings. Some measures that are used to describe a data set are measures of central tendency and measures of variability or dispersion. Process of Descriptive Analysis The measure of central tendency: Measure of variability Measure of central tendency It represents the whole set of data by a single value. It gives us the location of central points. There are three main measures of central tendency: Mean Mode Median

Measure of variability

Measure of variability is known as the spread of data or how well is our data is distributed. The most common variability measures are: Range Variance Standard deviation

Need of Descriptive Analysis

Descriptive Analysis helps us to understand our data and is a very important part of Machine Learning. This is due to Machine Learning being all about making predictions. On the other hand, statistics is all about drawing conclusions from data, which is a necessary initial step for Machine Learning. Let’s do this descriptive analysis in R.

Descriptive Analysis in R

Descriptive analyses consist of describing simply the data using some summary statistics and graphics. Here, we’ll describe how to compute summary statistics using R software. Mean It is the sum of observations divided by the total number of observations. It is also defined as average which is the sum divided by count. Median It is the middle value of the data set. It splits the data into two halves. If the number of elements in the data set is odd then the center element is median and if it is even then the median would be the average of two central elements. Mode It is the value that has the highest frequency in the given data set. The data set may have no mode if the frequency of all data points is the same. Also, we can have more than one mode if we encounter two or more data points having the same frequency. Variance It is defined as an average squared deviation from the mean. It is being calculated by finding the difference between every data point and the average which is also known as the mean, squaring them, adding all of them, and then dividing by the number of data points present in our data set. Standard Deviation It is defined as the square root of the variance. It is being calculated by finding the Mean, then subtract each number from the Mean which is also known as average and square the result. Adding all the values and then divide by the no of terms followed the square root.

Some more R function used in Descriptive Analysis:

Iris

Output

Screensh

ot

IRIS

Conclusi

on

I have successfully implemented descriptive Statistics on Car

price and Iris datasets in R