Filling in Missing Data

TABLE OF CONTENTS
Introduction .................................................................................................................................... 1
Overview of Algorithm .................................................................................................................... 1
Solution ........................................................................................................................................... 1
3.1
Creating the profiles ............................................................................................................... 1
3.2
Correcting odd or absent data ................................................................................................ 2
Experiment 1: Correction of mechanically limited, highly biased and irregular distortion ............ 3
4.1
Setting up the data.................................................................................................................. 3
4.2
Profiling each nominator ........................................................................................................ 3
4.3
The original table replaced with characteristic values ........................................................... 3
4.4
Correction Proportion ............................................................................................................. 4
Discussion........................................................................................................................................ 4
Limitations....................................................................................................................................... 4
1 INTRODUCTION
When people provide nominal data there can be biases and omissions that make the data difficult to
work with. Filling in the incomplete data allows an analyst to use their normal methodology in
further assessment, instead of having to create new practices and programs that can handle
irregularly distorted data.
This algorithm is not limited to people and can be used with sensors, pathing algorithms, control
systems and government department spending analyses.
2 OVERVIEW OF ALGORITHM
This algorithm uses the apparent bias of a person in the data they have provided to fill in data that
they have omitted or to correct for wildly uncharacteristic values.
Calculate mean of data

Create bias of each nominator using available data and mean.
Calculate expected mean using biases and mean
Calculate expected data from expected mean.
Continue analysis using complete data as usual.
3 SOLUTION
Setting up the data as a simple graphical example.
1
11
1
3.1 CREATING THE PROFILES

When determining the biases, all of the available data is used.
The structure is a set of variables that are not necessarily related to each other.
=
Each variable is a function of the given values from the nominators. In this case it is the mean of the
values given by the nominators. This algorithm does not interfere with any continuous function
applied to create the structure values.
=
The set of nominators share variables. Each gives its assessment of the variable. These could be a
panel describing levels of infrastructure, different cities describing various contributions to carbon
emissions (which need to be redefined as inefficient systems and excess waste producing systems,
carbon emissions are not inherently relevant)
=
Christopher Lindfield
Page 1 of 5
=
The mean, variance and standard deviation are now determined.
=
2 =
[( )2 ]
[( )2 ]
=
Determining the nominator bias. Note that probability functions are superior in generating general
curves but do not allow discrete inputs without curve estimations. The discrete statistics allow for
continuous inputs using quanisation and weightings determined from integration.
The mean bias determines whether how much higher or lower than other nominators, relatively,
this nominator is.
= =
= =
Using the profile bias of the nominator from the total analysis at the beginning is important at the
individual variable level as there may be uncharacteristic local bias within a structure, which is
substantially decreased by including the global bias.
3.2 CORRECTING ODD OR ABSENT DATA

First each data that is not odd or absent along a related line is used to generate an expected mean
that is higher or lower according to the bias of the nominator.
These expected means are then averaged to get the reference mean that would be expected if the
distribution is usual.
New data is generated using this expected mean so that analysis can be done as if the data set was
complete.
= = +
Page 2 of 5
4 EXPERIMENT 1: CORRECTION OF MECHANICALLY LIMITED, HIGHLY

BIASED AND IRREGULAR DISTORTION
This experiment shows:
1. How close to the correct values the algorithm will stay when adjusting incorrect data.
2. How the algorithm corrects the top left hand corner values which show how even
mechanical limitations of the data collection method or highly distorted data can be
overcome.
4.1 SETTING UP THE DATA

Structure n
1.47
2.12
2.92
3.75
4.58
5.42
6.25
7.12
N1
N2
1.00
1.00
1.00
2.00
3.00
4.00
5.00
5.00
N3
1.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
N4
N5
Var
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
std dev
0.80
1.70
2.50
2.50
2.50
2.50
2.50
3.70
0.89
1.30
1.58
1.58
1.58
1.58
1.58
1.92
4.2 PROFILING EACH NOMINATOR

bias per value
characteristic
bias
-0.52
-0.86
-1.21
-1.11
-1.00
-0.90
-0.79
-1.10
-0.52
-0.86
-0.58
-0.47
-0.37
-0.26
-0.16
-0.06
-0.52
-0.09
0.05
0.16
0.26
0.37
0.47
0.46
0.60
0.68
0.69
0.79
0.90
1.00
1.11
0.98
1.71
1.44
1.32
1.42
1.53
1.63
1.74
1.50
-0.94
-0.41
0.15
0.84
1.54
4.3 THE ORIGINAL TABLE REPLACED WITH CHARACTERISTIC VALUES
New mean
1.68
2.42
3.29
4.12
Characteristic values
0.63
0.90
1.44
2.27
1.10
1.58
2.27
3.10
1.60
2.31
3.15
3.98
2.22
3.21
4.25
5.08
2.84
4.12
5.35
6.18
Page 3 of 5
4.96
5.79
6.62
7.57
3.10
3.94
4.77
5.32
3.93
4.77
5.60
6.33
4.81
5.65
6.48
7.40
5.91
6.75
7.58
8.74
7.01
7.85
8.68
10.07
-0.60
-0.15
-0.05
0.00
0.04
0.06
0.07
0.08
-0.11
-0.07
-0.06
-0.02
0.01
0.04
0.05
0.03
0.05
-0.03
-0.07
-0.03
0.00
0.02
0.04
-0.01
4.4 CORRECTION PROPORTION

0.37
0.10
-0.44
-0.14
-0.03
0.02
0.05
-0.06
-0.10
-0.58
-0.13
-0.03
0.02
0.05
0.07
0.10
, = 10%
The reliability of this approach is approximately 90%.
5 DISCUSSION
From table 7.4 it can be seen that the algorithm corrected the significantly distorted data by up to
60% yet left the undistorted data almost untouched with a correction of ~4%.
The actual error of the analysis was 10%.
Why is this valuable in general?
This algorithm can also be used when exact relationships are not known, but suspected, between
entities such as cities or industries. It does not require any consideration of weighting between
indirectly related variables as this does not affect the bias.
6 LIMITATIONS
The particular described algorithm assumes that the profiles of the nominators are consistent across
structures. This does not need to be the case and simply treating each structure as separate and
creating a weighting for each when determining the global bias will allow the algorithm to be used
with inconsistent biases across multiple structures.
Page 4 of 5

Filling in Missing Data

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Filling in Missing Data

Diunggah oleh

Hak Cipta:

Format Tersedia

TABLE OF CONTENTS

Overview of Algorithm .................................................................................................................... 1

Creating the profiles ............................................................................................................... 1

Correcting odd or absent data ................................................................................................ 2

Setting up the data.................................................................................................................. 3

Profiling each nominator ........................................................................................................ 3

The original table replaced with characteristic values ........................................................... 3

Correction Proportion ............................................................................................................. 4

Calculate mean of data

3.1 CREATING THE PROFILES

3.2 CORRECTING ODD OR ABSENT DATA

4 EXPERIMENT 1: CORRECTION OF MECHANICALLY LIMITED, HIGHLY

4.1 SETTING UP THE DATA

4.2 PROFILING EACH NOMINATOR

4.3 THE ORIGINAL TABLE REPLACED WITH CHARACTERISTIC VALUES

4.4 CORRECTION PROPORTION

Anda mungkin juga menyukai