learning
Na Lu
Xi’an Jiaotong University
An aphorism in machine learning:
Sometimes it's not who has the best
algorithm that wins; it's who has the
most data.
More data
• Transactions (eCommerce)
• Log record
• Social media, like Weixin, facebook, twitter…
• Sensor, such as RFID, camera…
• Science research, gene data, images…
• Smart phone
• Manufacture
• …
Traditional databases: Rational
DB2
Characteristics of Big data
The fourth V:
Veracity
(uncertainty
of the data)
Characteristics of Big data
More characteristics
Report Prediction
Rational Correlation
Q1 Q2
• 2012
– 71%
– 63%
• 2011
– 69%
– 58%
Banking and financial
• 2010 markets respondents
(n=124)
– 36%
Global respondents
– 37% (n=1144)
IBM 2012
Big banks in the US
• Hadoop common
• Hadoop distributed file system (HDFS)
• Hadoop MapReduce
• Hbase
• Hive
• NoSQL
• Pig: A high level data flow language and execution
framework for parallel computation
• Cloudera: company support the open sources
• Julia: hot on WallStreet
• Spark: newest version of Big Data solution. No reference.
Three different areas
• Financial industry
• Medical and health care
• eCommerce and retail
• Science research
Why Big data?
– Recognizing anomalies
• Unusual sequences of credit card transactions
• Unusual patterns of sensor readings in a nuclear power plant
– Prediction
• Future stock prices or currency exchange rates
Neural Networks
• Auto-Encoder
• Restricted Boltzmann machine (RBM)
• Deep belief networks
• Convolutional neural networks
Deep Belief Network
• ImageNet
– 14,192,122 images, 21841 categories
– Image found via web searches for WordNet noun
synsets
– Hand verified using Mechanical Turk
– Bounding boxes for query object labeled
– New data for validation and testing each year
ILSVR challenge
• WordNet
– Source of the labels
– Semantic hierarchy
– Contains large fraction of English nouns
– Also used to collect other datasets like tiny images
(Torralba et al)
– Note that categorization is not the end/only goal, so
idiosyncrasies of WordNet may be less critical
• Taxonomy
ILSVR challenge
ILSVR challenge
ILSVR challenge
• 5 minutes, 5.1%
• 15 minutes, ~3%
Face detection method
• Back in 2001, two computer scientists, Paul Viola and Michael
Jones, triggered a revolution in the field of computer face detection.
Thank you.