2ndQuadrant 2014-5
PostgreSQL database
\d
\d sample_cars
SELECT count(*) FROM sample_cars;
SELECT * FROM sample_cars LIMIT 3;
2ndQuadrant 2014-5
Using
Data stored in the database can be explored and visualized in Orange
2ndQuadrant 2014-5
Orange components
2ndQuadrant 2014-5
2ndQuadrant 2014-5
SQL Table
Load the data set by placing the SQL Table widget on the Canvas,
open it and set the parameters
You can download all data to memory or work with it remotely if it is too big
2ndQuadrant 2014-5
Data Info
Used to see basic information
about the loaded table
2ndQuadrant 2014-5
Data Table
2ndQuadrant 2014-5
Visualizations - overview
Box Plot
Distributions
Continuous
Scatter plot (Scatter Map for big data)
Linear Projection (>2 variables at a time)
Categorical
Mosaic Plot
Sieve Diagram
2ndQuadrant 2014-5
Visualizations
2ndQuadrant 2014-5
Box Plot
Show basic statistics
mean, std, median, quartiles
min, max
2ndQuadrant 2014-5
Distributions
2ndQuadrant 2014-5
Scatter Plot
2ndQuadrant 2014-5
Sieve Diagram
Observe the (co)occurrence
of values for pairs of variables
The diagram shows which are
over- or under-represented
2ndQuadrant 2014-5
Sieve Diagram
2ndQuadrant 2014-5
Predictive models
2ndQuadrant 2014-5
Linear Regression
Model evaluation
2ndQuadrant 2014-5
Model evaluation
2ndQuadrant 2014-5
Classification
To see how to classify instances into categorical classes
set the target variable to origin
Test and Score some models and
see what kind of mistakes they do
using the Confusion Matrix widget
2ndQuadrant 2014-5
2ndQuadrant 2014-5
Model interpretation
Adjust the parameters of
Classification Tree
to produce smaller trees
e.g. min instances in leaves=20,
max depth=4
2ndQuadrant 2014-5
2ndQuadrant 2014-5
Some visualizations (Box Plot, Distributions, Sieve Diagram) work the same
way, but approximate the results on a subset of the data
Those that show individual instances usually show a sample
(Scatter Plot, Linear Projection, Heat Map)
or can be replaced by a modified version (e.g. Scatter Plot -> Scatter Map)
Most learning algorithms are not adapted to big data, but can be used on
explicit data samples (obtained with the Data Sampler widget)
2ndQuadrant 2014-5
2ndQuadrant 2014-5