Assignment 1: Number of Variables Number of Observations Total Missing (%) Numeric Categorical Boolean Date

Diunggah oleh

Rabeya Akhter Sumi

0% menganggap dokumen ini bermanfaat (0 suara)

23 tayangan2 halaman

Judul Asli

Assingment1.pdf

Hak Cipta

Format Tersedia

PDF, TXT atau baca online dari Scribd

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Laporkan Dokumen Ini

Hak Cipta:

Format Tersedia

Unduh sebagai PDF, TXT atau baca online dari Scribd

Tandai sebagai konten tidak pantas

0% menganggap dokumen ini bermanfaat (0 suara)

23 tayangan2 halaman

Assignment 1: Number of Variables Number of Observations Total Missing (%) Numeric Categorical Boolean Date

Diunggah oleh

Rabeya Akhter Sumi

Hak Cipta:

Format Tersedia

Unduh sebagai PDF, TXT atau baca online dari Scribd

Tandai sebagai konten tidak pantas

Lompat ke Halaman

Anda di halaman 1dari 2

Cari di dalam dokumen

Assignment 1

Introduction
In this assignment we have been provided with the data set which include both training and test data set. The main purpose of this
assignment is to find the value of Class value in the test data set using the logic of KNN AND the model used will be trained on provided
training dataset.

Data Analysis:
Analyzing the data before building the model on the data provided, helps in increasing the accuracy of the model build. The data that have
been provided consist of 10 variables. The statistical analysis of the data has been done on the first stage and the result are mentioned
below.

Statistical Analysis of Data

Data Set Analysis

The provided data set have been analyzed on the early stage to get the basic insight of the data, so that we can made the statistical model
on the data provided and below conclusions have been made.
Variables types
Dataset info
Numeric 10
Number of variables 10
Number of observations 204
Categorical 0
Total Missing (%) 0.00% Boolean 0
Date 0
Descriptive Statistics
Quantitative descriptions provided below by doing the descriptive analysis of the data. The results of the descriptive analysis
done is provided below.

Correlation Analysis
The predictions can be more accurate by removing unwanted variables. One of the very good way of analyzing the data is by
checking the correlation matrix and finding out the variables which are highly correlated and then we can remove those
variables to get high accuracy.

Here instead of checking the simple correlation we have checked the correlation using the Pearson and Spearman Correlation
matrix.

Clearly, we can see there are high chance of Ca and RI being duplicate with correlation value 0.8108. So, we can reject one of
the variables.

Methodology used:
1. We have used Euclidean Distance formula to calculate the nearest neighbor of data point.
2. Cross validation is used to test the optimum value of k for which the result is most accurate.
Cross validation result file: result.csv
Correlation matrix for CrossValidation: corrfile.csv
3. To test this, we have split the training data set in two sets train and validate.
4. We have predicted the value of Class column validate data set where we already knew this value.
5. We use value of K from 1 to 20.
6. Correlation between values predicted from k=1 to 20 and the expected value that we already knew is compared.
7. One with the maximum correlation value is selected as optimum value of K.
8. Below is the file that have been created by us while working on the provided data set.
a. Result set from different K values.
b. Correlation matrix

9. Once the K value is decided then, the model is built on the training data provided and the result is predicted.
10. Result with probability for the given data set is given below for K=2

Anda mungkin juga menyukai

Machine Learning Report
Dokumen42 halaman
Machine Learning Report
Sandya Vb
91% (11)
Using Statistics in the Social and Health Sciences with SPSS and Excel
Dari Everand
Using Statistics in the Social and Health Sciences with SPSS and Excel
Martin Lee Abbott
Belum ada peringkat
Predictive Modeling Business Report Seetharaman Final Changes PDF
Dokumen28 halaman
Predictive Modeling Business Report Seetharaman Final Changes PDF
Ankita Mishra
100% (1)
Power Electronics Quiz
Dokumen23 halaman
Power Electronics Quiz
Hardeep Singh Kang
75% (4)
Vijaya ML
Dokumen26 halaman
Vijaya ML
Vijayalakshmi Palaniappan
83% (6)
Education - Post 12th Standard - CSV
Dokumen11 halaman
Education - Post 12th Standard - CSV
Zohaib Imam
88% (16)
ML Report on Election Data
Dokumen36 halaman
ML Report on Election Data
Akshaya Kennedy
100% (3)
Predict Sales and Survival Using Linear and Logistic Models
Dokumen14 halaman
Predict Sales and Survival Using Linear and Logistic Models
Vijayalakshmi Palaniappan
50% (12)
Coincent - Data Science With Python Assignment
Dokumen23 halaman
Coincent - Data Science With Python Assignment
Sai Nikhil Nellore
100% (2)
Predict Sales and Survival with Regression Models
Dokumen10 halaman
Predict Sales and Survival with Regression Models
Anshul Dyundi
100% (1)
PCA for College Dataset Analysis
Dokumen11 halaman
PCA for College Dataset Analysis
Ruhee's Kitchen
Belum ada peringkat
Assignment-Based Subjective Questions/Answers
Dokumen3 halaman
Assignment-Based Subjective Questions/Answers
rahul
Belum ada peringkat
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
Dokumen36 halaman
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
preeti
100% (2)
Data Exploration & Visualization
Dokumen23 halaman
Data Exploration & Visualization
divya kolluri
Belum ada peringkat
IP 346-92 Determination of PAC in Lubrificants
Dokumen6 halaman
IP 346-92 Determination of PAC in Lubrificants
Mark Foster Jr.
100% (1)
0.0.0.0 Lab - Configuring Devices For Use With Cisco Configuration Professional (CCP) 2.5 - Instructor
Dokumen12 halaman
0.0.0.0 Lab - Configuring Devices For Use With Cisco Configuration Professional (CCP) 2.5 - Instructor
Salem Trabelsi
Belum ada peringkat
Missing Value Treatment
Dokumen22 halaman
Missing Value Treatment
rphmi
Belum ada peringkat
Grundfosliterature 79846
Dokumen20 halaman
Grundfosliterature 79846
Payden Sewell
Belum ada peringkat
Principal Component Analysis
Dokumen13 halaman
Principal Component Analysis
Shil Shambharkar
Belum ada peringkat
Surabhi Charu Project
Dokumen16 halaman
Surabhi Charu Project
sachin joshi
Belum ada peringkat
All Lectures Quiz's with Answer
Dokumen29 halaman
All Lectures Quiz's with Answer
ayeshasghar50
Belum ada peringkat
A Bayesian Interpretation of The Confusion Matrix
Dokumen22 halaman
A Bayesian Interpretation of The Confusion Matrix
alysia vpa
Belum ada peringkat
Sayan Pal Business Report Advance Statistics Assignment PDF
Dokumen13 halaman
Sayan Pal Business Report Advance Statistics Assignment PDF
Sayan Pal
Belum ada peringkat
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
Dokumen17 halaman
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
sumit sinha
Belum ada peringkat
ML Project Shivani Pandey
Dokumen49 halaman
ML Project Shivani Pandey
Shubhangi Pandey
100% (1)
Data Science
Dokumen17 halaman
Data Science
Nabajit
Belum ada peringkat
A Bayesian Interpretation of The Confusion Matrix
Dokumen21 halaman
A Bayesian Interpretation of The Confusion Matrix
Natsuyuki Hana
Belum ada peringkat
Analysis and Prediction of House Prices by Linear Regression Model
Dokumen91 halaman
Analysis and Prediction of House Prices by Linear Regression Model
2001 Since
Belum ada peringkat
Report
Dokumen24 halaman
Report
Faizan Bajwa
Belum ada peringkat
FAQ - ReCell
Dokumen5 halaman
FAQ - ReCell
Nkechi Koko
Belum ada peringkat
What Is Linear Regression
Dokumen14 halaman
What Is Linear Regression
Avanija
Belum ada peringkat
Predictive - Modelling - Project - PDF 1
Dokumen31 halaman
Predictive - Modelling - Project - PDF 1
Shiva Kumar
Belum ada peringkat
PM Projec2 - SOBAC
Dokumen38 halaman
PM Projec2 - SOBAC
Soba C
Belum ada peringkat
Dimension Reduction
Dokumen78 halaman
Dimension Reduction
Guneet
Belum ada peringkat
Datapreprocessing
Dokumen8 halaman
Datapreprocessing
NAVYA Tadisetty
Belum ada peringkat
machine learning Project Report
Dokumen65 halaman
machine learning Project Report
Abhishek Abhi
Belum ada peringkat
Assignment-Based Subjective Questions/Answers
Dokumen3 halaman
Assignment-Based Subjective Questions/Answers
rahul
Belum ada peringkat
Dimensionality Reduction
Dokumen82 halaman
Dimensionality Reduction
Dharaneesh .R.P
Belum ada peringkat
Discrinant Ana
Dokumen10 halaman
Discrinant Ana
samuel kolawole
Belum ada peringkat
A Guide To Data Exploration
Dokumen20 halaman
A Guide To Data Exploration
mike110*
Belum ada peringkat
Lect-1-Types and Summarizing Data - 2017
Dokumen51 halaman
Lect-1-Types and Summarizing Data - 2017
Sara wannas
Belum ada peringkat
Exam With Model Answers
Dokumen4 halaman
Exam With Model Answers
jjia048
Belum ada peringkat
Section 6 Data - Statistics For Quantitative Study
Dokumen142 halaman
Section 6 Data - Statistics For Quantitative Study
Tưởng Nguyễn
Belum ada peringkat
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
Dokumen16 halaman
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
「瞳」你分享
Belum ada peringkat
Chi-Square Test For Feature Selection in Machine Learning
Dokumen15 halaman
Chi-Square Test For Feature Selection in Machine Learning
Amit Phulwani
Belum ada peringkat
CNUR 860 - FALL - 2020 Stats 2 - Instruction
Dokumen7 halaman
CNUR 860 - FALL - 2020 Stats 2 - Instruction
zobia
100% (1)
Machine Learning
Dokumen9 halaman
Machine Learning
Sanjay Kumar
Belum ada peringkat
Lesson 3 - Quality Control and Calculations - Schoology
Dokumen3 halaman
Lesson 3 - Quality Control and Calculations - Schoology
dyo
Belum ada peringkat
LDA KNN Logistic
Dokumen29 halaman
LDA KNN Logistic
shruti gujar
100% (1)
CSIT Module IV Notes
Dokumen19 halaman
CSIT Module IV Notes
Abhishek Jha
Belum ada peringkat
Big Data Analytics Lab File
Dokumen61 halaman
Big Data Analytics Lab File
Akshay Jain
Belum ada peringkat
5 - Pca & Garett Rank
Dokumen14 halaman
5 - Pca & Garett Rank
William Veloz Diaz
Belum ada peringkat
QuantitationDataSet en
Dokumen236 halaman
QuantitationDataSet en
Adriana Gustovarac
Belum ada peringkat
Intro To Calibration Transcript
Dokumen7 halaman
Intro To Calibration Transcript
LindsayPat8911
Belum ada peringkat
Ecotrix With R and Python
Dokumen25 halaman
Ecotrix With R and Python
zuhanshaik
Belum ada peringkat
Assignment 1 SOLUTION
Dokumen11 halaman
Assignment 1 SOLUTION
Subash Adhikari
Belum ada peringkat
ML DL NLP Definitions
Dokumen22 halaman
ML DL NLP Definitions
18r91a1255
Belum ada peringkat
Assignment 3_LP1
Dokumen13 halaman
Assignment 3_LP1
bbad070105
Belum ada peringkat
Tutorial
Dokumen27 halaman
Tutorial
xbsd
Belum ada peringkat
ML Practical 04
Dokumen19 halaman
ML Practical 04
chatgptlogin2001
Belum ada peringkat
chi-SquaredTest - Vishal (21DM217) - Vatsal (21DM216) - Preeti (21DM242) - Absent On4th&11th July.
Dokumen17 halaman
chi-SquaredTest - Vishal (21DM217) - Vatsal (21DM216) - Preeti (21DM242) - Absent On4th&11th July.
HITESH SHARMA-DM 21DM078
Belum ada peringkat
Experiment No 7
Dokumen7 halaman
Experiment No 7
Apurva Patil
Belum ada peringkat
Principal Component Analysis
Dokumen9 halaman
Principal Component Analysis
Geethakshaya
100% (1)
Contributions to Correlational Analysis
Dari Everand
Contributions to Correlational Analysis
Robert J. Wherry
Belum ada peringkat
Sliding Mode Methods For Fault Detection and Fault Tolerant Control
Dokumen12 halaman
Sliding Mode Methods For Fault Detection and Fault Tolerant Control
jopiter
Belum ada peringkat
Analytic Geometry Concepts Explained
Dokumen404 halaman
Analytic Geometry Concepts Explained
Tshegofatso Ntsime
Belum ada peringkat
The Envelopes of The Arts Centre in Singapore
Dokumen12 halaman
The Envelopes of The Arts Centre in Singapore
Andri
Belum ada peringkat
UK Tutorial MotorControl PDF
Dokumen27 halaman
UK Tutorial MotorControl PDF
Definal Chaniago
Belum ada peringkat
MJB 20 Pile Cap
Dokumen24 halaman
MJB 20 Pile Cap
Rao
Belum ada peringkat
Reaffirmed 2002
Dokumen29 halaman
Reaffirmed 2002
Manish Kumar Singh
100% (1)
Cloud Computing Bangla-Ragib Hassan
Dokumen19 halaman
Cloud Computing Bangla-Ragib Hassan
Shah Shishir
100% (1)
Batangas State University College of Arts and Sciences: CAS Office: (043) 980-0385 Local: 1125 - E-Mail
Dokumen7 halaman
Batangas State University College of Arts and Sciences: CAS Office: (043) 980-0385 Local: 1125 - E-Mail
Paul Lyndon Almo
Belum ada peringkat
Paper JLP 2019 224 Original V0
Dokumen17 halaman
Paper JLP 2019 224 Original V0
Юрий
Belum ada peringkat
Standard Test Method For Splitting Tensile Strength of Cylindrical Concrete Specimens1
Dokumen5 halaman
Standard Test Method For Splitting Tensile Strength of Cylindrical Concrete Specimens1
Lupita Ramirez
Belum ada peringkat
OHAUS Navigator Balance
Dokumen49 halaman
OHAUS Navigator Balance
Hanka Kubik
100% (2)
15a. Caretium NB-201 PDF
Dokumen2 halaman
15a. Caretium NB-201 PDF
Azmiardi Akhmad
Belum ada peringkat
Introduction To Microcontroller
Dokumen5 halaman
Introduction To Microcontroller
Song
Belum ada peringkat
Pengaruh Penerapan Carbohydrate Loading Modifikasi Terhadap Kesegaran Jasmani Atlet Sepak Bola
Dokumen13 halaman
Pengaruh Penerapan Carbohydrate Loading Modifikasi Terhadap Kesegaran Jasmani Atlet Sepak Bola
RizqiAnnisaPermatasari
Belum ada peringkat
Linear Forms in Logarithms and Applications
Dokumen242 halaman
Linear Forms in Logarithms and Applications
Tarun Patel
100% (3)
Travelling Microscope
Dokumen3 halaman
Travelling Microscope
Rohan Mohanty
Belum ada peringkat
Nuriah FR Artikel
Dokumen15 halaman
Nuriah FR Artikel
Anonymous Gon7kIs
Belum ada peringkat
Eclipse SW Rel - 07-08-55 Customer Release Notes (Low Latency) - 01may2015
Dokumen9 halaman
Eclipse SW Rel - 07-08-55 Customer Release Notes (Low Latency) - 01may2015
Sohaib Salih
Belum ada peringkat
Chapter 1 Error Mesurement
Dokumen99 halaman
Chapter 1 Error Mesurement
Muhammad Syahmi
Belum ada peringkat
PK Solutions 2.0 PK Solutions 2.0 PK Solutions 2.0 PK Solutions 2.0
Dokumen18 halaman
PK Solutions 2.0 PK Solutions 2.0 PK Solutions 2.0 PK Solutions 2.0
rigaut74
Belum ada peringkat
C Example Programs: Input and Output Programs
Dokumen21 halaman
C Example Programs: Input and Output Programs
bsr987
Belum ada peringkat
ScienceDirect Citations 1690959956805
Dokumen7 halaman
ScienceDirect Citations 1690959956805
mudit bhargava
Belum ada peringkat
Circle Theorems
Dokumen26 halaman
Circle Theorems
Arman Javed
Belum ada peringkat
Hybrid Converter: For NEC Microwave Network Management System
Dokumen2 halaman
Hybrid Converter: For NEC Microwave Network Management System
ms_alethea
Belum ada peringkat