Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Weather Forecasting System using Linear Regression: A Project Based Lab Report, Study notes of Data Warehousing

Data Warehouse is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users for analysis.

Typology: Study notes

2020/2021

Uploaded on 10/24/2021

sandeep-s-1
sandeep-s-1 🇮🇳

1 document

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
i
PROJECT BASED LAB REPORT
On
(Weather Forecasting)
Submitted in partial fulfilment of the
Requirements for the award of the Degree of
Bachelor of Technology
In
Computer science and Engineering
Under the esteemed guidance of
(Dr. K. Bhanu Prakash)
(Professor)
By
STUDENT ID STUDENT NAME
170030104 Bandi. Sandeep Reddy
170031008 P B N Anusha
(DST-FIST Sponsored Department)
K L EDUCATION FOUNDATION
Green Fields, Vaddeswaram, Guntur District-522 502
2019-2020
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Weather Forecasting System using Linear Regression: A Project Based Lab Report and more Study notes Data Warehousing in PDF only on Docsity!

i PROJECT BASED LAB REPORT On

(Weather Forecasting)

Submitted in partial fulfilment of the Requirements for the award of the Degree of Bachelor of Technology In Computer science and Engineering Under the esteemed guidance of (Dr. K. Bhanu Prakash) (Professor) By STUDENT ID STUDENT NAME 170030104 Bandi. Sandeep Reddy 170031008 P B N Anusha (DST-FIST Sponsored Department) K L EDUCATION FOUNDATION Green Fields, Vaddeswaram, Guntur District-522 502 2019-

ii TABLE OF CONTENTS CHAPTERS PAGE NO ABSTRACT 1 CHAPTER 1: INTRODUCTION 2 1.1 INTRODUCTION 2 1.2 PROBLEM DEFINITION 2 1.3 SCOPE 3 1.4 PURPOSE 3 1.5 PROBLEM AND EXISTING TECHNOLOGY 4 1.6 PROPOSED SYSTEM 4 CHAPTER 2: REQIUREMENTS & ANALYSIS 5 2.1 PLATFORM REQUIREMENTS 5 2.2 MODULE DESCRIPTION 5 CHAPTER 3: DESIGN & IMPLEMENTATION 12 3.1 ALGORITHMS 12 3.2 PSEUDO CODE 12 CHAPTER 4: SCREENSHOTS 19 CHAPTER 5: CONCLUSION 21 CHAPTER 6: REFERENCES 21

Page 2

1.INTRODUCTION

Data Warehousing Data Warehouse is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users for analysis.  Data Mining Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets. Data Mining is all about discovering unsuspected/ previously unknown relationships amongst the data.It is a multi-disciplinary skill that uses machine learning, statistics, AI and database technology. 1.1. Introduction Rainfall Prediction is the application of science and technology to predict the amount of rainfall over a region. It is important to exactly determine the rainfall for effective use of water resources, crop productivity and pre-planning of water structures. In this project, we used Linear Regression to predict the amount of rainfall. Linear Regression tells us how many inches of rainfall we can expect. 1.2 Problem Definition It is important to exactly determine the rainfall for effective use of water resources, crop productivity and pre-planning of water structures.

Page 3 1.3 Scope It tells us how many inches of rainfall we can expect. 1.4 Purpose There are several reasons why weather forecasts are important. They would certainly be missed if they were not there. It is a product of science that impacts the lives of many people. The following is a list of various reasons why weather forecasts are important:

  1. Helps people prepare for how to dress (i.e. warm weather, cold weather, windy weather, rainy weather)
  2. Helps businesses and people plan for power production and how much power to use (i.e. power companies, where to set thermostat)
  3. Helps people prepare if they need to take extra gear to prepare for the weather (i.e. umbrella, rain coat, sun screen)
  4. Helps people plan outdoor activities (i.e. to see if rain/storms/cold weather will impact outdoor event)
  5. Helps curious people to know what sort of weather can be expected (i.e. a snow on the way, severe storms)
  6. Helps businesses plan for transportation hazards that can result from the weather (i.e. fog, snow, ice, storms, clouds as it relates to driving and flying for example)
  7. Helps people with health related issues to plan the day (i.e. allergies, asthma, heat stress)
  8. Helps businesses and people plan for severe weather and other weather hazards (lightning, hail, tornadoes, hurricanes, ice storms)
  9. Helps farmers and gardeners plan for crop irrigation and protection (irrigation scheduling, freeze protection)

Page 5

2.REQUIREMENTS

2.1. Platform Requirements Hardwar e/ Software Hardware / Software element Specification /version Hardwar e Processor i RAM 2GB Hard Disk 250GB Software OS Windows,Linux. Jupyter NoteBook. Python 3. Python IDE Microsoft Azure 2.2. Modules Description In this project we have Two modules

  1. Data gathering and pre - processing.
  2. Applying Algorithm for prediction. Explanation:

Page 6

  1. In this module we first gather the data(dataset) for our prediction model.Data comes in all forms, most of it being very messy and unstructured. They rarely come ready to use. Datasets, large and small, come with a variety of issues- invalid fields, missing and additional values, and values that are in forms different from the one we require. In order to bring it to workable or structured form, we need to “clean” our data, and make it ready to use. Some common cleaning includes parsing, converting to one-hot, removing unnecessary data, etc. In our case, our data has some days where some factors weren’t recorded. And the rainfall in cm was marked as T if there was trace precipitation. Our algorithm requires numbers, so we can’t work with alphabets popping up in our data. so we need to clean the data before applying it on our model. 2)Once the data is cleaned, In this module that cleaned data can be used as an input to our Linear regression model. Linear regression is a linear approach to form a relationship between a dependent variable and many independent explanatory variables. This is done by plotting a line that fits our scatter plot the best, ie, with the least errors. This gives value predictions, ie, how much, by substituting the independent values in the line equation. We will use Scikit-learn’s linear regression model to train our dataset. Once the model is trained, we can give our own inputs for the various columns such as temperature, dew point, pressure, etc. to predict the weather based on these attributes.

Page 8 Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output). Hence, the name is Linear Regression. In the figure above, X (input) is the work experience and Y (output) is the salary of a person. The regression line is the best fit line for our model.

Page 9 Hypothesis function for Linear Regression : y=mx+c Where y is the response variable. x is the predictor variable. m and c are constants which are called the coefficients. 2.3. Data Set The dataset is a public weather dataset from Austin, Texas available on Kaggle. austin_weather.csv Columns: Date- The date of the collection (YYYY-MM-DD) TempHighF- High temperature, in degrees Fahrenheit TempAvgF- Average temperature, in degrees Fahrenheit TempLowF- Low temperature, in degrees Fahrenheit DewPointHighF-

Page 11 Average visibility, in miles VisibilityLowMiles- Low visibility, in miles WindHighMPH- High wind speed, in miles per hour WindAvgMPH- Average wind speed, in miles per hour WindGustMPH- Highest wind speed gust, in miles per hour PrecipitationSumInches- Total precipitation, in inches ('T' if trace) Events- Adverse weather events (' ' if None)

Page 12

3.DESIGN AND IMPLEMENTATION

3.1 Algorithms: Linear Regression: Module-1 : Data gathering and pre - processing. Module-2: Applying Algorithm for prediction. 3.2Source Code

importing libraries

import pandas as pd import numpy as np import matplotlib.pyplot as plt

read the data in a pandas dataframe

data = pd.read_csv("C:/Users/TEMP.SANDEEP/Desktop/austin_weather.csv") #seeing head values data.head(5) #seeing shape of the dataset data.shape

Page 14 plt.show() #basic static

save the data in a csv file

data.to_csv('C:/Users/TEMP.SANDEEP/Desktop/austin_final_final.csv')

importing libraries

import pandas as pd import numpy as np import sklearn as sk from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt

read the cleaned data

data = pd.read_csv("C:/Users/TEMP.SANDEEP/Desktop/austin_final_final.csv")

the features or the 'x' values of the data

these columns are used to train the model

the last column, i.e, precipitation column

will serve as the label

X = data.drop(['PrecipitationSumInches'], axis = 1)

the output or the label.

Y = data['PrecipitationSumInches']

reshaping it into a 2-D vector

Y = Y.values.reshape(-1, 1)

Page 15

consider a random day in the dataset

we shall plot a graph and observe this

day

day_index = 798 days = [i for i in range(Y.size)]

initialize a linear regression classifier

clf = LinearRegression()

train the classifier with our

input data.

clf.fit(X, Y)

give a sample input to test our model

this is a 2-D vector that contains values

for each column in the dataset.

inp = np.array([[74], [60], [45], [67], [49], [43], [33], [45], [57], [29.68], [10], [7], [2], [0], [20], [4], [31]]) inp = inp.reshape(1, -1)

print the output.

print('The precipitation in inches for the input is:', clf.predict(inp))

plot a graph of the precipitation levels

versus the total number of days.

one day, which is in red, is

Page 17 plt.scatter(days[day_index], x_vis[x_vis.columns.values[i]][day_index], color ='r') plt.title(x_vis.columns.values[i]) plt.show() OUTPUT: The precipitation in inches for the input is: [[1.33868402]] Graphs:

  1. Histogram for Temp

Page 18 2)The precipitation trend graph: