Anda di halaman 1dari 25

TensorFlow AI

Max Kleiner , 11.05.2018


TensorFlow Cluster Agenda
 Install MongoDB & Graphviz
– For Sacred Experiment Recorder
– cluster-based counter statistic decorator callback algorithm
 most experiment content with metrics
 What's behind TF ? (pattern recognition)
 Cluster & Classify with different inputs, params, config
 Definelabel or topic ratings, convolution_layers[0]['size'],
– assumed/implicit labels, predict versus feature extraction
 Conclusions/Summary/Source
An agent or probe that collects threat data from the security sensor
Normalization and correlation middleware. A console and associated database
for managing the solution and its alerts.
https://www.esecurityplanet.com/views/article.php/1501001/Security-Threat-Correlation-The-Next-Battlefield.htm 2

2
What is Graphviz?

 Graphviz is open source graph  Probabilistic (PD, BN,


visualization software. Graph BC, PLSA, LDA, ...)
visualization is a way of
– Independence
representing structural inform-

CASSANDRA System
ation as diagrams of abstract assumptions made
graphs and networks. It has  Distance-based
important applications in (matching, VSIM, k-NN,
networking, bio-informatics, CR, PageRank, Kmeans)
software engineering, database
– Features used
and web design, machine
learning, and in visual – Structures exploited
interfaces for other technical  Model-based (rules, BC,
domains. BN, boosting) 3
 Social vs content
from keras.datasets import mnist
0 1 2 3 4
5 6 7 8 9

The MNIST dataset is comprised of 70,000 handwritten


numeric digit images and their respective labels 0..9.

CASSANDRA System
There are 60,000 training images and 10,000 test images, all
of which are 28 pixels by 28 pixels.

 76 english, 38 content="voa,
 36 美国之音 " 74 special
 74 voice 44 voa
 36 америки,
 36 голос 4

from module import class
0 1 2 3 4
5 6 7 8 9

from __future__ import division, print_function,


unicode_literals

CASSANDRA System
from sacred import Experiment
from sacred.observers import MongoObserver
from sacred.utils import apply_backspaces_and_linefeeds
import pymongo, pickle, os
import pydot as pdot
import numpy as np
import tensorflow as tf

5
from keras import backend as K

@ex.automain
def define_and_train(batch_size, epochs,
convolution_layers,
maxpooling_pool_size, maxpooling_dropout,
dense_layers, dense_dropout,

CASSANDRA System
final_dropout,_run):
from keras.datasets import mnist
from keras.models import Sequential #convolution
from keras.layers import Dense, Dropout, Flatten, Conv2D,
from keras.utils import to_categorical
from keras.losses import categorical_crossentropy
from keras.optimizers import Adadelta
from keras import backend as K 6
from keras.callbacks import ModelCheckpoint, Callback
GEO Cluster Demo

An agent or probe that collects threat data from the security sensor
Normalization and correlation middleware. A console and associated database
for managing the solution and its alerts.
https://www.esecurityplanet.com/views/article.php/1501001/Security-Threat-Correlation-The-Next-Battlefield.htm 7

7
MongoDB
 Start the shell process mongod from
 Call from script

CASSANDRA System
8
MongoDB My Cluster sacred.runs & completed

CASSANDRA System
9
Task Manager

10

https://pythonexample.com/search/mnist%20tensorboard%20demo/8 10
Task II

11

https://pythonexample.com/search/mnist%20tensorboard%20demo/8 11
Task III

12

https://pythonexample.com/search/mnist%20tensorboard%20demo/8 12
What's behind test ? (backend pattern, crossentropy)
60000/60000 [==============================] - 426s 7ms/step - loss: 0.4982 - acc: 0.8510 -
val_loss: 0.0788 - val_acc: 0.9749
Using TensorFlow backend.
INFO - MNIST-Convnet4 - Result: 0.9749
INFO - MNIST-Convnet4 - Completed after 0:07:27
Test loss: 0.0788029053777
Test accuracy: 0.9749
 59392/60000 [============================>.] - ETA: 5s - loss: 0.0571 - acc: 0.9829
 59520/60000 [============================>.] - ETA: 3s - loss: 0.0572 - acc: 0.9829
 59648/60000 [============================>.] - ETA: 2s - loss: 0.0572 - acc: 0.9829
 59776/60000 [============================>.] - ETA: 1s - loss: 0.0572 - acc: 0.9829
 59904/60000 [============================>.] - ETA: 0s - loss: 0.0573 - acc: 0.9829
 60000/60000 [==============================] - 513s 9ms/step - loss: 0.0573 - acc:
0.9829 - val_loss: 0.0312 - val_acc: 0.9891
 Using TensorFlow backend.
 INFO - MNIST-Convnet4 - Result: 0.9891
 INFO - MNIST-Convnet4 - Completed after 0:33:28
 Test loss: 0.0311644290059 13
 Test accuracy: 0.9891
 13
What's behind code ? (keras, pymongo, graphviz)

db = pymongo.MongoClient('mongodb://localhost:27017/').sacred

print(tf.__version__)
os.environ["PATH"] += os.pathsep +
'C:/Program Files (x86)/Graphviz2.38/bin/'

from tensorflow.examples.tutorials.mnist import input_data

ex = Experiment("MNIST-Convnet4")
ex.observers.append(MongoObserver.create())
ex.captured_out_filter = apply_backspaces_and_linefeeds

https://www.programcreek.com/python/example/103267/keras.datasets.mnist.load_data
14

14
PIP3 Install
pip3 install sacred
Collecting sacred
Downloading
https://files.pythonhosted.org/packages/2d/86/7be3af
a4d4c1c0c76a5de03e5ff779797ab2654e377685255c11c13c0e
a5/sacred-0.7.3-py2.py3-none-any.whl (82kB)

Collecting pymongo
Downloading
https://files.pythonhosted.org/packages/46/39/b9bb7fed3e3a0ea621a1
512a938c105cd996320d7d9894d8239ca9093340/pymongo-3.6.1-cp36-cp36m-
win_amd64.whl (291kB)
100% |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 296kB 728kB/s
Installing collected packages: pymongo
Successfully installed pymongo-3.6.1

https://github.com/pinae/Sacred-MNIST/blob/master/train_convnet.py 15

15
Cluster with different inputs, parameters, label
from collections import Counter
import matplotlib.pyplot as plx
import nltk, import scipy, import spacy
stemmer = nltk.SnowballStemmer("english")
from collections import defaultdict
from sklearn.cluster import KMeans
from gensim import corpora, models, similarities

0:Title,1:URL,2:Tags,3:Keywords,4:Relevance,5:Text,6:Kword

train_size=70;
CLUSTERSIZE = 8 #11
COMMONKEYWORDS = 90; #2000
SETCOL = 6;
DATSET = 15070; 16

https://www.springboard.com/blog/data-mining-python-tutorial/ 16
[0 1 2 1 1 1 1 0 0 1 1 0 0 2 1]
(array([0, 1, 2]), array([5, 8, 2], dtype=int64))
(array([0, 1, 2]), array([8, 5, 2], dtype=int64)) 17
metrics.adjusted_mutual_info_score(labels1, labels2): 1.0
17
https://backlinko.com/long-tail-keywords

The long tail keywords...

18

18
Create Questions (Method, Algos, Tools)
Dataset Finding the question is often more important than finding the answer
John Tukey

https://www.soovle.com/ 19

https://answerthepublic.com/reports/ 19
Machine Learning Process Chain
• Collab (Set a control thesis, understand the
data, get resources Python etc.)
• Collect (Scrapy data, store, filter data)
- TextCrawled_20180420_IR7_all.xlsx

• Consolidate (normalization and aggregation,


filters,slice out irrel. data or char map prob.)
• Cluster (kmeans for categorys, collocates for
n-keywords)algorithm – unsupervised)
• Classify (SVM, Sequential, Bayes – supervised)
• Conclude and Control (Predict or report context
thesis and drive data to decision)

http://www.softwareschule.ch/examples/machinelearning.jpg

20
20
TASK12: Vectorise Data
Time Series Autocorrelation
https://spacy.io/

21

21
EXAMPLE: keyword gen., Cluster Framework

1) Select a set of target files.


2) Go to the 'Preferences' menu and chose the 'Keyword
Preferences' option.
3) Choose the keyword generation method (a statistical measure) to calculate
The 'keyness' of the target file words. The default setting of
Log Likelihood is recommended.
4) Choose a significance value (p value) for keyness statistic.
5) Choose an effect size measure to rank the keywords.
6) Choose a threshold for the number of keywords to be displayed.
7) Choose whether or not to view 'Negative Keywords' (target file words with
an unusually low frequency compared with the frequency in the reference
Corpus)
https://www.princeton.edu/~otorres/Stata/statnotes

Keyword List
This tool shows the which words are unusually frequent
(or infrequent) in the corpus in comparison with the words in a
reference corpus. This allows you to identify characteristic words
in the corpus, for example, as part of a genre or ESP study.

22
School of Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku,
Tokyo 169-8555, Japan 22
Visualizing Cosine Distance

v ( a, j ) v(b, j )
similarity of doc a to doc b = sim( a, b)   
 word i  v ( a, j ' )
2
 (b, j ' )
v 2

Let A  ..., v(a, j ),...  j' j'


   A'B'
 A A
Let A'   

CASSANDRA System
|| A ||  ( a, j ' )
j'
v 2
word 1
doc c
word 2
doc d
...

doc a word j doc b

...
23
word n
Double Trouble
THE TEST OVERVIEW
Status Description
QUEUED
File
The run was just "C:\Users\max\AppData\Local\Programs\P
queued ython\Python36\lib\site-
and not run yet packages\sklearn\metrics\cluster\unsupervis
RUNNING
Currently running (but see below) ed.py", line 254, in calinski_harabaz_score
COMPLETED intra_disp += np.sum((cluster_k -
Completed successfully mean_k) ** 2)
FAILED
The run failed due to an exception MemoryError
INTERRUPTED
The run was cancelled with a
KeyboardInterrupt No. of URLs removed 76,732,515
TIMED_OUT + No. of robots.txt 3,675,634
The run was aborted using a TimeoutInterrupt requests
[custom]
A custom py:class: - No. of excludedURLs 3,050,768
~sacred.utils.SacredInterrupt = No. of HTTP requests 77,357,381
occurred
HTTP requests not 1,763850
respond
24

24
SUMMARY & QUESTIONS
Which Stat / TF Package – Proposal Keras, Bayes & KMeans
 Mindtoolset : https://basta.net/speaker/max-kleiner/

 KMeans-Watson-ElasticSearchSQLServer-Scrapy-TensorFlow-SVM-
RandomForest-Sacred-MongoDB
 https://sacred.readthedocs.io/en/latest/tensorflow.html

 https://www.dewresearch.com/

 https://ofai.github.io/million-post-corpus/

 singularitynet.io

25

25