Computational Journalism 2016 Week 1: Introduction

Frontiers of
Computational Journalism
Columbia Journalism School
Week 1: Introduction, Clustering
September 16, 2016
Computational Journalism:
Definitions
Broadly defined, it can involve changing how stories are
discovered, presented, aggregated, monetized, and
archived. Computation can advance journalism by
drawing on innovations in topic detection, video analysis,
personalization, aggregation, visualization, and
sensemaking.
- Cohen, Hamilton, Turner, Computational Journalism, 2011
Definitions
Stories will emerge from stacks of financial disclosure
forms, court records, legislative hearings, officials' calendars
or meeting notes, and regulators' email messages that no
one today has time or money to mine. With a suite of
reporting tools, a journalist will be able to scan, transcribe,
analyze, and visualize the patterns in these documents.
- Cohen, Hamilton, Turner, Computational Journalism, 2011
Cohen et al. model

Data
Reporting
User
Computer
Science
CS for presentation /
interaction
CS
Data
CS
Reporting
User
Filter stories for user

CS
Data
Reporting
CS
Data
Reporting
CS
CS
Filtering
Reporting
CS
Data
CS
CS
User
Examples of filters
Facebook news feed

What an editor puts on the front page
Google News
Reddits comment system
Twitter
Techmeme
New York Times recommendation system
http://snap.stanford.edu/nifty
Kony 2012 early network, by Gilad Lotan
CS in Journalism
CS
Data
Reporting
CS
Data
Reporting
CS
CS
CS
Reporting
CS
Data
CS
Effects
Filtering
CS
User
Journalism as a cycle
CS
Effects
Data
CS
Reporting
User
CS
CS
Filtering
Journalism with algorithms

vs.
Journalism about algorithms
Websites Vary Prices, Deals Based on Users' Information

Valentino-Devries, Singer-Vine and Soltani, WSJ, 2012
Message Machine
Jeff Larson, Al Shaw, ProPublica, 2012
Computer Science in Journalism

Reporting
Presentation
Filtering
Tracking
Algorithmic accountability
Definitions
the application of computer science to the problems
of public information, knowledge, and belief, by
practitioners who see their mission as outside of both
commerce and government.
- Jonathan Stray, A Computational Journalism Reading List,
2011
Course Structure
Unit 1: Filters
Information retrieval, TF-IDF, topic modeling, search engines, social filtering, filtering
system design.
Unit 2: Interpreting Data

Quantification, error, statistical basics, Bayesianism, prediction, competing
hypotheses, narratives.
Unit 3: Methods
Visualization, knowledge representation, social network analysis, privacy and
security, tracking flow and effects
Information Retrieval
Visualization
Clustering
Natural Language
Processing
Text Analysis
Sociology
Filter Design
Social Network Analysis
Artificial
Intelligence
Knowledge Representation
Graph Theory
Drawing Conclusions
Cognitive Science
Statistics
Epistemology
Administration
Assignment after each class
Some assignments require programming, but
your writing counts for more than your code!
Course blog
http://compjournalism.com
Final project
for 6-pt students only
Grading
Dual degree students
Pass/Fail.
Final project: paper, story, or software.
Non-journalism students
80% assignements
20% class participation
Vector representation of objects

Fundamental representation for many data mining, clustering,
machine learning, visualization, NLP, etc. algorithms.
!
#
#
#
#
#
#
#
"
x1 $
&
x2 &
&
x3 &
&
&
xN &
%
Each xi is a numerical or categorical feature

N = number of features or dimension
Choosing Features
!
#
#
#
#
#
#
#
"
Journalism
How do we
represent the
world
numerically?
x1 $
&
x2 &
&
x3 &
&
&
xN &
%
! x
f (1)
#
# x f (2 )
#
#
# x f (k )
"
$
&
&
&
&
&
%
where k N
Machine learning
Which variables
carry the most
information?
Examples of vector representations

Obvious
o movies watched / items purchased
o Legislative voting history for a politician
o crime locations
Less obvious, but standard

o document vector space model
o psychological survey results
Tricky research problem: disparate field types

o Corporate filing document
o Wikileaks SIGACT
What can we do with vectors?

Predict one variable based on others
o this is called regression
o or maybe "classification"
o supervised machine learning
Group similar items together

o This is clustering
o or maybe "classification" with unknown categories
o unsupervised machine learning
Classification and Clustering

Classification is arguably one of the most central and
generic of all our conceptual exercises. It is the
foundation not only for conceptualization, language,
and speech, but also for mathematics, statistics, and
data analysis in general.
- Kenneth D. Bailey, Typologies and Taxonomies: An
Introduction to Classification Techniques
Interpreting High Dimensional Data
UK House of Lords voting record, 2000-2012.

N = 1043 lords by M = 1630 votes
2 = aye, 4 = nay, -9 = didn't vote
Distance metric
Intuitively: how (dis)similar are two items?
Formally:
d(x, y) 0
d(x, x) = 0
d(x, y) = d(y, x)
d(x, z) d(x, y) + d(y, z)
Distance metric
d(x, y) 0
-
distance is never negative
d(x, x) = 0
-
reflexivity: zero distance to self
d(x, y) = d(y, x)
-
symmetry: x to y same as y to x
d(x, z) d(x, y) + d(y, z)

- triangle inequality: going direct is shorter
Distance matrix
Data matrix for M objects of N dimensions
! x1 $ ! x1,1
# & #
x2 & # x2,1
#
X=
=
# & #
# & #
" xM % #" x1,M
Distance matrix
x1,N $
&
&
&
&
xM ,N &%
x1,2
x2,2
! d
# 1,1
# d2,1
Dij = D ji = d(xi , x j ) = #
#
# d1,M
"
d1,2 dM ,M $
&
&
d2,2
&
&
dM ,M &%
We think of a cluster like this
Real data isnt so simple
Different clustering algorithms

Partitioning
o keep adjusting clusters until convergence
o e.g. K-means
Agglomerative hierarchical
o start with leaves, repeatedly merge clusters
o e.g. MIN and MAX approaches
Divisive hierarchical
o start with root, repeatedly split clusters
o e.g. binary split
K-means demo
http://www.paused21.net/off/kmeans/bin/
Agglomerative merging clusters

put each item into a leaf node
while num clusters > 1
find two closest clusters
merge them
Divisive splitting clusters

put all items into one cluster
while num clusters < num items
find largest cluster
split so pieces as far as
possible
complete link or max
single link or "min"
average
Trees and Dendrograms
UK House of Lords voting clusters
UK House of Lords voting clusters

Algorithm instructed to separate MPs into five clusters. Output:
1
1
2
2
1
1
1
2
1
2
1
2
1
2
3
5
4
1
3
1
1
2
2
1
1
2
2
1
1
1
2
2
2
1
1
2
1
1
1
2
2
1
1
1
3
1
3
3
4
4
3
1
1
2
2
2
1
2
1
2
1
3
4
2
2
3
2
5
2
1
2
2
2
1
2
1
4
2
2
4
2
1
2
2
1
2
3
2
1
1
2
1
2
1
2
4
2
2
2
4
1
1
2
1
5
5
4
1
1
2
.
.
Voting clusters with parties

LDem
1
Con
1
Lab
2
Lab
2
Con
1
Con
1
Con
1
Lab
2
Con
1
Lab
2
Con
1
XB
2
Con
1
Lab
2
XB
3
Con
5
XB
4
Con
1
XB
3
Con
1
LDem
1
Lab
2
Lab
2
LDem
1
Con
1
Lab
2
Lab
2
Con
1
Con
1
Con
1
Lab
2
Lab
2
XB
2
LDem
1
Con
1
Lab
2
Con
1
Con
1
Con
1
Lab
2
Lab
2
Con
1
Con
1
Con
1
XB
3
Con
1
XB
3
XB
3
XB
4
XB
4
Bp
3
Con
1
Con
1
Lab
2
XB
2
Lab
2
Con
1
XB
2
XB
1
Lab
2
Con
1
XB
3
XB
4
XB
2
Lab
2
XB
3
XB
2
LDem
5
Lab
2
LDem
1
Lab
2
Lab
2
Lab
2
Con
1
Lab
2
Con
1
XB
4
Lab
2
Lab
2
XB
4
Lab
2
Con
1
XB
2
Lab
2
Con
1
Lab
2
XB
3
Lab
2
Con
1
Con
1
Lab
2
XB
1
XB
2
LDem
1
Lab
2
XB
4
Lab
2
Lab
2
Lab
2
XB
4
LDem
1
Con
1
Lab
2
XB
1
Con
5
LDem
5
XB
4
Con
1
Con
1
Lab
2
Clustering Algorithm
Input: data points (feature vectors).

Output: a set of clusters, each of which is a set
of points.
Visualization
Input: data points (feature vectors).

Output: a picture of the points.
Dimensionality reduction
Problem: vector space is high-dimensional. Up to
thousands of dimensions. The screen is twodimensional.
We have to go from
x RN
to much lower dimensional points
y RK<<N
Probably K=2 or K=3.
This is called "projection"
Projection from 3 to 2 dimensions
Linear projections
Projects in a straight line
to closest point on
"screen." Mathematically,
y = Px
where P is a K by N matrix.
Think of this as rotating to align the "screen" with

coordinate axes, then simply throwing out values of
higher dimensions.
Which direction should we look from?

Principal components analysis: find a linear projection
that preserves greatest variance
Take first K eigenvectors of covariance matrix

corresponding to largest eigenvalues. This gives a Kdimensional sub-space for projection.
Sometimes overlap is unavoidable
Real data isnt so simple
Nonlinear projections
Still going from highdimensional x to lowdimensional y, but now
y = f(x)
for some function f(), not
linear. So, may not
preserve relative
distances, angles, etc.
Fish-eye projection from 3 to 2 dimensions
Multidimensional scaling
Idea: try to preserve distances between points "as much as
possible."
If we have the distances between all points in a distance matrix,
D = |xi xj| for all i,j
We can recover the original {xi} coordinates exactly (up to rigid
transformations.) Like working out a country map if you know how
far away each city is from every other.
Multidimensional scaling
Torgerson's "classical MDS" algorithm (1952)
Reducing dimension with MDS

Notice: dimension N is not encoded in the distance
matrix D (its M by M where M is number of points)
MDS formula (theoretically) allows us to recover point
coordinates {x} in any number of dimensions k.
MDS Stress minimization

The formula actually minimizes stress
stress(x) = xi x j dij
i, j
Think of springs between every pair of points. Spring between xi,

xj has rest length dij
Stress is zero if all high-dimensional distances matched exactly in

low dimension.
Multi-dimensional Scaling
Like "flattening" a
stretchy structure into
2D, so that distances
between points are
preserved (as much as
possible")
House of Lords MDS plot
Robustness of results
Regarding these analyses of congressional voting, we
could still ask:
Are we modeling the right thing? (What about other
legislative work, e.g. in committee?)
Are our underlying assumptions correct? (do
representatives really have ideal points in a
preference space?)
What are we trying to argue? What will be the effect of
pointing out this result?
Why do clusters have meaning?
What is the connection between mathematical

and semantic properties?
No unique right clustering

Different distance metrics and clustering algorithms
give different results.
Should we sort incident reports by location, time,
actor, event type, author, cost, casualties?
There is only context-specific categorization.
And the computer doesnt understand your context.
Different libraries,
different categories

Computational Journalism 2016 Week 1: Introduction

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Computational Journalism 2016 Week 1: Introduction

Diunggah oleh

Hak Cipta:

Format Tersedia

Frontiers of

Cohen et al. model

Filter stories for user

Facebook news feed

Kony 2012 early network, by Gilad Lotan

Journalism with algorithms

Websites Vary Prices, Deals Based on Users' Information

Computer Science in Journalism

Unit 2: Interpreting Data

Vector representation of objects

Each xi is a numerical or categorical feature

Examples of vector representations

Less obvious, but standard

Tricky research problem: disparate field types

What can we do with vectors?

Group similar items together

Classification and Clustering

Introduction to Classification Techniques

Interpreting High Dimensional Data

UK House of Lords voting record, 2000-2012.

distance is never negative

reflexivity: zero distance to self

d(x, z) d(x, y) + d(y, z)

We think of a cluster like this

Real data isnt so simple

Different clustering algorithms

Agglomerative merging clusters

Divisive splitting clusters

complete link or max

single link or "min"

Trees and Dendrograms

UK House of Lords voting clusters

UK House of Lords voting clusters

Voting clusters with parties

Input: data points (feature vectors).

Input: data points (feature vectors).

This is called "projection"

Projection from 3 to 2 dimensions

Projection from 2 to 1 dimensions

Think of this as rotating to align the "screen" with

Projection from 3 to 2 dimensions

Which direction should we look from?

Take first K eigenvectors of covariance matrix

Sometimes overlap is unavoidable

Real data isnt so simple

Fish-eye projection from 3 to 2 dimensions

Reducing dimension with MDS

MDS Stress minimization

Think of springs between every pair of points. Spring between xi,

Stress is zero if all high-dimensional distances matched exactly in

House of Lords MDS plot

Why do clusters have meaning?

What is the connection between mathematical

No unique right clustering

Anda mungkin juga menyukai