Anda di halaman 1dari 49

Big Data, Smart Data, Dumb Data and Security Intelligence

Fred Wilmot, Global Security Practice Manager

Agenda
Simple Big Data A Child Becomes a Teenager Can Data Science Solve Business Problems Visibility and Insight Visibility and Insight, Context and Analytics A Use Case for Smart Data
2

What is Big Data Really?

When the size of the data itself becomes part of the problem*
* Mike Loukides OReilly Radar

Big Data, Get Used To IT

Its Not Just About the Bigness


Data Diversity
Data Volume Traditional Tools

Desired Time to Answer


5

Big Data Comes from Machines


Volume | Velocity | Variety | Variability

Machine data is fastest growing, most


complex, most valuable area of big

GPS, RFID, Hypervisor, data Web Servers, Email, Messaging, Clickstreams, Mobile, Telephony, IVR, Databases, Sensors, Telematics, Storage, Servers, Security Devices, Desktops

Big Data Technologies


TeraData Greenplum Hadoop Cassandra CouchDB MongoDB Redis

Unstructured Platforms

RDBMS Sharding

SQL & Map / Reduce

HDFS Storage + Map / Reduce

NoSQL Map / Reduce

Real Time Indexing Alerting Analytics

Relational Database (highly structured)

Distributed File System (semi-structured)


7

Key/Value, Columnar or Other (semi-structured)

Temporal, Unstructured Heterogeneous

A Child Becomes a Teen

Cute Kid to Frustrating Teenager

Data Science is more than storing data in HDFS, NoSQL, or a cloud offering. its getting value, insight and analysis out
9

Traughs

10

Data Science and Statistics


The sexy job in the next 10 years will be statisticians Hal Varian, Chief Economist at Google

Data scientists are the next-generation analytics professional; they are responsible for turning the data into insight. They extract meaning from Big Data to help the business.
11

Big Data Roles and Their Challenges

Data Analyst
Conduct analytics on structured data with traditional tools

Data Architect
Guide designs, set standards, and manage developers

Data Scientist
Extract meaning from big data to help the business

Developer
Build scalable applications based on data in Big Data platforms

Accessing and using data in Hadoop

Keeping pace with evolving ecosystem


12

Low-level tools and low productivity

Dev focus on integration and low-level MapReduce

What About the Business Problem?

Data Scientarchilystoper

Security professionals need ALL these skills


13

Required - A Broader Look at Security Data


The Data Driven Business

Increasing Data Amounts

Industrial Control Systems data (SCADA) The Internet of Things

Traditional IT Security Data

Cloud Security Data

Mobile Security Data

Increasing Threat Complexity


14

Big Data Analysis Demands


Data Integration

Advanced Analytics
Algorithmic Engine Knowledge Management Collaboration Platform Data Driven Thinking

Supporting Context and insight

15

Big Data Security Requirements


Now, the hard part for Business: Data Experts to train expert systems do not grow on trees Capacity and scale for consumption costs money, for unproven analytics Adding responsibility to an underresourced workforce limits impact How do I visualize insight?
16

You need the whole gang

Visibility and Insight


17

Delivering Visibility but What about Insight?

Big Data + Analytics = NOT ENOUGH

18

Visibility with Context


Big Data + Analytics + Contextual Insight = Smart Data
Product Analytics Understand the relationship of product feature effectiveness to trend line of product success Customer Analytics Native mobile app feature adoption and engagement by social class using all handsets Security Analytics Top malicious watchlist domains visited across company associated with campaign, locality, and user class request

19

Show Me

Lets Look at Low Value High Volume Data

20

A DNS Bot net Example

21

Dynamic DNS Fast Flux

22

Bot net Behavior


Common Challenges

Better awareness of ecosystems (how large? Geographic understanding? visibility into C2 servers, signatures and attribution) take down services (identify and degrade hostile botnets prior to an attack, more law enforcement, law enforcement/agreements to stop attacks.) proactive ISP assistance (ASNs, router/flow data) Full view for geographic perspective, what controls, IPs, protocols more visibility into global actors - capabilities, weapons of choice, etc. Sharing threat intelligence built into multiple vendor products across others. Real-time and proactive DDoS forecasting, behavioral modeling with historical context Deep Technical Expertise
23

What Would Scooby Do?

How do I know its bad? Where do I start? That s a lot of data!

We have some data analysis needs to be met


DNS data (50,000 records/sec for the entire day) Proxy Data (200,000 unique URL requests for entire day) Network Flow data, maybe deep packet inspection (easily in excess of 10TB)

We also need to add some context?


Watchlist activity/Threat intelligence System type/Application Frequency of communication Locality of request Risk model
24

DNS Analytics

25

Analytics + Context
After looking at our 24 hours of DNS traffic, we applied some heuristics to get an idea what is BotNet traffic and what is legitimate Now we have some analytics, what next?
Lets compare with our watchlist of malicious domains Lets look for new requests we havent seen before Lets look for requests with the same variance between the last request Lets look combine insight from proxy logs to validate potential bad traffic by domain, IP, or Top Level Domain Lets use some geospatial
26

Analytics + Context

All Heuristic Bot Net DNS run through Threat Intelligence watchlists for matches

27

Threat Intelligence
Common Challenges

Sharing threat intelligence across multiple vendors

Collaborating on Internal Threat Intelligence across the business


Adding context to real-time analysis in an automated fashion for disparate tools is hard Associating threat profiles based on locality, frequency, affinity and complexity, to business risk Curating Threat Intelligence over time Automation, and iteration as part of a mature Security Operations model

28

External Threat Intelligence Integration


import httplib, urllib params = urllib.urlencode({'apikey': 'f6dbdee2dc8c6118933b90178657877cc2cede3023ce0eba4xxxxxxxxxxxxxxx', 'ip': 46.229.160.7', 'method': 'ipq'}) headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "application/json"} conn = httplib.HTTPConnection("us.api.ipviking.com:80") conn.request("POST", "/api/", params, headers)

response = conn.getresponse()
print response.status, response.reason data = response.read() print(data) conn.close()

29

Insight For the Win!


Lookup Table of Largest DNS Request by Country, and geolocation Server DNS associations with other systems performing their Function

Reducing the TTL for these DNS entries will help prevent targeted attack from Syria now that we know what causes that
30

Hybridization
Internal apps, customer-facing apps, mobile apps
Analysis tools (SAS, SPSS, R, Tableau)

Data Services (REST, WS) Pig Hive HBase

Relational DBs

MapReduce HDFS
ETL
Time Series

Enterprise DW
Real-time analytics

Web

Files

Social

Logs

ERP

CRM

31

Augment Big Data with Smart Data


Extend context with lookups and external data sources.

LDAP, AD

Watch Lists

CMDB

Messag e Stores Reference Lookups

Correlate across multiple data sources and data sets using indexes and keys
32

ocessing and analyses of a Our approach incorporates the use of a NoSQL database as the here are a myriad of open key element in the mitigation of these three tension points (see n and display of network Figure 1). ://www.wireshark.org), and r ones, with more listed at ovide a comprehensive set the stream, which 1. network Start with Security Ops model ards of 20,000 rules. Work 2. Big Data + Internal context data gs is also common, Splunk 3. Add Threat Intelligence Analysis Farm [13] uses adata Collaboration/Communication on4. and query performance. 4. Automation vel localized rule-based 5. Iteration her-level analysis functions zational email traffic or r.

Drive Insight through Collaboration

d streaming tools is to save ize one of the analytically as WEKA [6], Orange [3], rical data. The significant NoSQL Databases for Streaming Network Analysis 1: Conceptual diagram of the approach used for our cyber s that * Excerpt the - Using analysis becomes Figure ers may find themselves defense system illustrating the central role served by our NoSQL database. against static data sources 33
Brian Wylie*, Daniel Dunlavy*, Warren Davis IV*, Jeff Baumes**
*Sandia National Laboratories, **Kitware Inc

ABSTRACT

different programming languages and scripts are welcome, components are interchangeable, and most importantly its

Cyber Operations Model


Incident Handling Monitor Security Technologies Malware Analysis Cyber Network Defense Forensics and Root Cause Analysis Fraud and Theft Analysis Behavioral Analysis Threat Intelligence REACTIVE
34

Incident Response Process Collaboration Iteration Automation

Compliance Reporting and Audit

Application Security design

Secure Coding & Development Vulnerability Remediation

Network Security Design


Threat Modeling

Cyber Network Offense


PROACTIVE

Industrial Data Explosion

INDUSTRIAL DATA & THE INTERNET OF THINGS

The NEXT WAVE


STRUCTURED DATA

MACHINE DATA

Medical Devices Driving Better Patient Insights


Device

Manufactured

Tracking Medical Device Supply Chain to to Drive Critical Insights

Patient Behavior

Prescribed
to patient

Prescription Patterns

Shipped

to Physician

Returned
to iRhythm

Supply Chain Analytics

Capture Energy, Environmental, and Operational data

Analyze Building Sensors to Cut Energy Costs

Develop Deep Understanding of Building


Enhance Efficiencies and Reduce Costs

Operational Intelligence Leads to More Efficient, Better Performing Buildings


37

Aggregate Data from Vehicles Remotely Vehicles Acceleration, Braking, Battery Charge and Location

Cars as Telemetry Sensors

Manage the impact

Insights
into customers driving habits

Frequency
of charging and charging locations

on the power grid

Optimize Charging Infrastructure Shape Next-gen Electric Vehicles

Mining Electric Car Big Data

Thank You!

Fred@Splunk.com

Further Reading
OpenDNS Dynamic DNS Fast Flux

DNS

Botnetshttp://www.elsevierdirect.com/companions/9781597491358/casestudies/D NS.pdf
http://www.syssec-project.eu/media/page-media/3/dietrich-ec2nd11.pdf

40

Hybridization
Internal apps, customer-facing apps, mobile apps
Analysis tools (SAS, SPSS, R, Tableau)

Data Services (REST, WS) Pig Hive HBase

Relational DBs

MapReduce HDFS
ETL
Time Series

Enterprise DW
Real-time analytics

Web

Files

Social

Logs

ERP

CRM

41

Mobile Methodology
Client Application
Static Analysis

Web Application
Static Analysis

Network
Dynamic Analysis Dynamic Analysis

A Mobile App RE into .json

{"tags": {"UTIL": ["Lcom/jumptap/adtag/actions/BrowserAdAction;", "Lcom/inmobi/androidsdk/ai/container/IMWebView$2;", "Lcom/flurry/android/ai;", "Lcom/jumptap/adtag/media/JTMediaPlayer;", "Lcom/jumptap/adtag/media/JtVideoAdView$3;", "Lcom/jumptap/adtag/media/JtVideoAdView$4;", "Lcom/inmobi/androidsdk/IMAdInterstitial$1$1;", "Lcom/jumptap/adtag/media/JtVideoAdView;", "Lcom/rovio/ka3d/GLSurfaceView$DefaultContextFactory;", "Lcom/millennialmedia/android/MillennialMediaView;", "Lcom/jumptap/adtag/utils/JtAdFetcher;", "Lcom/jumptap/adtag/utils/JtAdManager;", "Lcom/burstly/lib/component/networkcomponent/burstly/ormma/OrmmaDisplayContr oller;", "Lcom/inmobi/androidsdk/ai/container/IMWebView$TimeOut;", "Lcom/burstly/lib/component/networkcomponent/burstly/ormma/OrmmaSensorContro ller;", "Lcom/jumptap/adtag/utils/JtSettingsParameters;", "Lcom/flurry/android/v;", "Lcom/inmobi/androidsdk/ai/controller/JSAssetController;", "Lcom/jumptap/adtag/JtAdView;", "Lcom/millennialmedia/android/BasicMMAdListener;", "Lcom/google/ads/AdActivity;", "Lcom/millennialmedia/android/HandShake$AdTypeHandShake; 43

Strings and Androguard

44

Splunk Shows Malicious Apps by Behavior

45

Lookup Using Key Value Persistent Cache


Download and install Redis

Download and install Redis Python module


Import Redis module in Python and populate key value DB Import Redis module in lookup function given to Splunk to lookup a value given a key
Redis is an open source, advanced keyvalue store.

46

Redis Lookup
###CHANGE PATH According to your REDIS install ###### sys.path.append(/Library/Python/2.6//redis-2.4.5-py.egg) import redis def main() #Connect to redis Change for your distribution pool = redis.ConnectionPool(host=localhost,port=6379,db=0) redp = redis.Redis(connection_pool=pool)

47

Redis Lookup (cont.)


def lookup(redp, mykey): try: return redp.get(mykey)

except: return

48

Combine Persistent Cache with External Lookup


For data that is relatively static First see if the data is in the persistent cache If not, look it up in the external source such as a database or web service If results come back, add results to the persistent cache and return results For data that changes often, you will need to create your own cache retention policies
49

Anda mungkin juga menyukai