Anda di halaman 1dari 10

LINKS REFCARDZ GUIDES VIDEOS ABOUT

POST

LOG IN or JOIN

ZONES: AGILE BIG DATA CLOUD DEVOPS INTEGRATION IOT JAVA MOBILE NOSQL PERFORMANCE WEB DEV
Big Data/Analytics Zone is brought to you in partnership with:

Search

Working on Android, iOS or Windows Phone apps? Check out our Mobile Zone

Carlo Scarioni
Bio

Website

Connect with DZone

Clustering Customers for Machine


Learning With Hadoop and Mahout
05.10.2015 |

4345 views

RELATED MICROZONE RESOURCES


15

Share

The Top 10 Enterprise NoSQL Use Cases

The Big Data/Analytics Zone is presented by Couchbase. Learn what roles scalability,

performance and agility play in evaluating NoSQL databases.


This post was originally published in my company's tech blog Simplybusiness

The problems

NoSQL Performance When Scaling by RAM


MongoDB 3.0 (w/ WiredTiger) vs. Couchbase
Server 3.0.2
Come learn from Couchbase architects and
engineers | Couchbase Connect 2015, Jun
Making the Shift from Relational to NoSQL

We manage quite a bit of customer data, starting from the beginning of a customer's search for a
new insurance policy, all the way until they buy (or don't buy) the policy. We keep all of this data,
but for the most part, we don't do anything to improve our customer offering.
Our site looks exactly the same for every customer -- we don't try to engage with them on a
more personal level. No customisation exists, which means that each customer's experience
doesn't adapt to his or her personality, specic trade, or home or business location. Nothing at
all.
Filling out a long form is always boring. But lling it out while being unsure of what information to
put where, and being forced to make a phone call to conrm details, is even more boring. In
many cases, it could mean the customer just gets bored and leaves our form.

The idea
We wanted to create and test a solution that allowed us to group together similar customers using
different sets of dimensions depending on the information we wanted to provide or obtain. We
thought about introducing clustering technology and algorithms to group our customers.
This would be a very rough implementation that would allow us to prove certain techniques and
solutions for this type of problems -- it certainly would NOT cover all the nuances that machine
learning algorithms and analysis carry with them. Many liberties were taken to get to a proof of
concept. The code presented here is not 100% the same code used in the spike, but it forms a very
accurate approximation
This post covers the implementation of the solution.

DZones Guide to Cloud Development,


2015 Edition
Evaluate cloud services and tools while learning how to
develop and orchestrate in cloud-based environments.

Solution
Setting up the Clustering backend algorithms to allow multidimensional clustering.
I had already decided that I would put into practice my knowledge of Mahout and Hadoop to run
the clustering processing. I installed Hadoop using my own recipes from hadoop-vagrant to be
run on a local Vagrant cluster, and then to be run in a AWS cluster.
Hadoop is a framework that allows the processing of certaing types of tasks in a distributed
environment using commodity machines that allows it to massively scale horizontaly. Its main
components are the map-reduce execution framework and the HDFS distributed lesystem. For
more details, check out my blog post.
Getting the data:
After Hadoop was installed, the rst task was to nd and extract the data. The data was stored
on a SqlServer database, so we needed to fetch it and put it into HDFS. There is a fantastic tool
called Sqoop that's built just for this. Sqoop not only allows you to get data ready for HDFS, it
actually uses Hadoop itself to paralellize the extraction of the data. The standard way to run
Sqoop is as follows:
1. sqoop import driver com.microsoft.sqlserver.jdbc.SQLServerDriver
connect "jdbc:sqlserver://xxx:1433;username=xxx;password=xxx;
databaseName=xxx" query "select xxx from xxxx" splitby
customer._id fetchsize 20 deletetargetdir targetdir
aggregated_customers packagename "clustercustomers.sqoop"
nullstring '' fieldsterminatedby ','
The previous command generates the required hadoop compatible les that will be used in the
subsequent analysis. The most important part is the query that you want to use to extract the data.
In our case, in the rst iteration, we extracted information like trade, vertical, claims, years_insured,
and turnover. These values are the dimensions that we will use to group our "similar" customers.
K-Means Clustering.
I have read quite a bit about different machine learning techniques and algorithms. I have
developed a bit with them in the past, particularly in the recommendation area. The rst thing to
decide with a Machine Learning problem is what exactly I want to achieve. First, let's look at the
three main problems that Machine Learning solves, and then follow the reasoning behind my
choices.
Machine Learning algorithms in Mahout can be broadly categorized in three main areas:
Recommendation Algorithms: Try to make an informed guess about what things you might
like out of a large domain of things. In the simplest and most common form, the inference is
done based on similarity. This similarity could be based on items that you've already said you
like, or similarity with other users that happen to like the same items as you.
Assume we have a database of movies, and say you like Lethal Weapon.
Item-Based similarity:
recommendations for movies similar to Lethal Weapon.
User-Based similarity:
recommendations for movies that other people who liked Lethal Weapon liked as well
Classication Algorithms: In the family of Supervised Learning algorithms (supervised because
the set of resolutions and categories are known beforehand). Classication algortihms allow you
to assign an item to a particular category given a set of known characteristics (where the
category belongs to a limited set of options)
This technique used in Spam detection systems.
Let's say you decide that any email with at least two of the following characteristics: 4 or
more images, 4 words written in all-capital letters, and the text 'congratulations' with an
exclamation mark at the end should be marked as Spam, and anything with fewer than two

of these is not Spam.


The Classication system will be built upon this characteristics and rules. It knows that any
incoming email that matches these rules will belong to the corresponding category.
Clustering Algorithms: These belong to the Unsupervised Learning family because there is no
predetermined set of possible answers. Clustering algorithms are simply given a set of inputs
with dimensions. The algorithm itself works out how to organize and group the data into
individual clusters.
Given the previous denitions, it was very clear to me that what we needed was a Clustering
solution because I didn't have any idea how the data was supposed to be organized. I wanted the
system to gure out the clustering and return a set of groups containing similar customers in each
of them.
I selected K-Means clustering as the clustering algorithm I wanted to use. I have some familiarity
with it, and it's the most common clustering algorithm in use and in the bibliography around.
To quickly explain K-Means: When given a number of elements N and a number of resulting
clusters K , it nds K centroid points (rstly at random) and iterates an X amount of times nding the
n elements in N that are closer to each centroid k and grouping them together. In each iteration x
the centroids are recalculated and the n elements are assigned to the cluster determined by their
closest centroid. A better explanation is here.
From the previous explanation, you can see that K-Means expects the number of clusters K as an
input. However, I had no idea at all of what a good number of clusters would be. In Mahout, you can
combine K-Means with another clustering algorithm named Canopy. Canopy is capable of nding a
rst set of K centroids that can then be fed into K-Means.
The way Canopy works is roughly the following: Instead of being given a K for the total number
of clusters, you provide Canopy with a measure of the size that you expect each cluster to have.
This allows you to get different sized clusters on different runs of the algorithm (i.e. if you want to
cluster people by a wider or narrower geographical location). The algorithm works by using a
distance measure (most machine learning algorithms use this to nd similarities) and a couple of
threshold values T1 and T2. Looping through all the points in the dataset, the algorithm takes each
point and compares aginst T1 and T2 (T2 > T2) to each of the already created Canopies. If it is
between T2, it will be assigned to that existing Canopy. If it is between T1, but not T2, it will be
added to the Canopy but will be allowed to be part of another Canopy. If it is neither it will be used
as the center of a new Canopy. At the end, after all Canopies are created, the centroid is calculated
for each of them. And these will be the centroids used for the next step using K-Means.
Deciding on and weighting the dimensions
For the K-Means, Canopy and in most Machine Learning algorithms, the way to nd whether or not
a particular item belongs or how to make a recommendation is based on a measure of distance. To
be able to measure the distance between two items (customers, in our use case) their values need
to be converted into a form that allows them to be compared and measured. This means that we
need to convert our values, whatever they are, to a numeric representation that would allow us to
use traditional distance measure algorithms to compare them.
The dimensions I planned on using for this rst iteration were:
product
trade
turnover
employees
claims
Out of these dimensions, only 2 were already numbers (turnover and claims), and the other 3 were
text values. Even trickier, only employees followed a directly comparable value ("less than 5
employees" is comparable to "more than 100 employees") while the other 2 were discret disjunt
values (product "business" is not direcly comparable to product "shop").

For the case of employees, I converted the values to consecutive numbers like (values are made up
for this example):
"less than 15 employees" -> 1
"between 15 and 50 employees" -> 2
"between 50 and 200 employees" -> 3
For the two discrete properties product and trade, I have to create individual dimensions for each of
the discrete values that they can be. As in my example I was only going to use Baker and
Accountant for trades and Shop and Business for product, the nal dimensions Vector ended
something like:
1. | shop | business | accountant | baker | turnover | employees | claims
|
So let's say we wanted to model an accountant with 50000 turnover 20 employees and 2 claims.
His vector would look like:
1. | 0 | 1 | 1 | 0 | 50000 | 2 | 2 |
We can already see a problem with this vector. In particular we can see that the value turnover is
much larger than the rest of the dimensions. This means that the calculation of measure will be
extremely inuenced by this value: we say this value has a much bigger weight than the rest. For
the example, we assume that there is a maximum turnover of 100000.
In our case, we want to give extra weight to the product and trade dimensions and make turnover
much less signicant.
Mahout offers some functionality for doing just that. Normally as an implementation of the class
WeightedDistanceMeasure it works by building a Vector with multipliers for each of the dimensions
of the original vector. This vector needs to be the same size as the dimensions vector. In our case,
we could have a vector like this:
1. | 10 | 10 | 5 | 5 | 1/100000 | 1/2 | 1/10 |
The effect of that Vector will be to alter the values of the original by multiplying the product by 10,
the trade by 5, making sure that turnover is always less than 1, halving the inuence of number of
employees and making claims less inuential.
NOTE: Finding the correct dimensions and weights for a clustering algorithm is a really hard
exercise which normally requires multiple iterations to nd the "best" solution. Our example,
following with the Spike approach for this hackathon, is using completely arbitrary values chosen
just to prove the technique, and not carefully crafted normalizations of data. If these values are
good enough for our examples, then they are good enough.
Following are the main parts of the raw code written to convert the initial data to a list of vectors:
01. public class VectorCreationMapReduce extends Configured implements
Tool {
02.
03. public static class VectorizerMapper extends Mapper<LongWritable,
Text, Text, VectorWritable> {
04.
05.
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
06.
07.
VectorWritable writer = new VectorWritable();
08.
System.out.println(value.toString());
09.
String[] values = value.toString().split("\\|");
10.
double[] verticals = vectorForVertical(values[1]);
11.
double[] trade = vectorForTrade(values[2]);
12.
double[] turnover = vectorForDouble(values[3]);
13.
double[] claimCount = vectorForDouble(values[4]);
14.
double[] xCoordinate = vectorForDouble(values[7]);
15.
double[] yCoordinate = vectorForDouble(values[8]);
16.
NamedVector vector = new NamedVector(new
DenseVector(concatArrays(verticals, trade, turnover, claimCount,

xCoordinate, yCoordinate)), values[0]);


17.
writer.set(vector);
18.
context.write(new Text(values[0]), writer);
19.
}
20.
21. }
22.
23. public static void main(String[] args) throws Exception {
24.
Configuration conf = new Configuration();
25.
int res = ToolRunner.run(conf, new VectorCreationMapReduce(), args);
26.
System.exit(res);
27. }
28.
29. @Override
30. public int run(String[] strings) throws Exception {
31.
Configuration conf = super.getConf();
32.
conf.set("fs.default.name", "hdfs://"+
Configurations.HADOOP_MASTER_IP+":9000/");
33.
conf.set("mapred.job.tracker",
Configurations.HADOOP_MASTER_IP+":9001");
34.
Job job = new Job(conf, "customer_to_vector_mapreduce");
35.
job.setJarByClass(VectorCreationMapReduce.class);
36.
job.setMapperClass(VectorizerMapper.class);
37.
job.setMapOutputKeyClass(Text.class);
38.
job.setMapOutputValueClass(VectorWritable.class);
39.
job.setOutputKeyClass(Text.class);
40.
job.setOutputValueClass(VectorWritable.class);
41.
job.setOutputFormatClass(SequenceFileOutputFormat.class);
42.
job.setNumReduceTasks(0);
43.
FileInputFormat.addInputPaths(job,
"aggregated_customers_with_coordinates");
44.
FileOutputFormat.setOutputPath(job, new Path("vector_seq_file"));
45.
return job.waitForCompletion(true) ? 0 : 1;
46. }
47. }
The code is a Hadoop map-reduce job (with only map phase) that takes the input from the Sqoop
exported le and creates a NamedVector with the dimensions values. The Vector classes used are
Mahout provided classes for use within Hadoop.
The next step was to run the actual Canopy algorithm to nd the K centroids that we are going to
feed to the K-Means algorithm.
This is run something like:
1. ./hadoop org.apache.mahout.clustering.canopy.CanopyDriver i
/vector_seq_file/partm00000 o customercentroids dm
clustercustomers.mahout.CustomWeightedEuclideanDistanceMeasure t1
4.0 t2 2.0
The previous command species the Mahout class that contains the Canopy Hadoop job. It
species an input le from HDFS which is the output of the le generated by the previous
vectorization process. It also species an output le customer-centroids where the generated
vector centroids will be generated to. We also specify that we want to use an Euclidean distance
measure with weighting, which is dened with the custom class we see below:
1. public class CustomWeightedEuclideanDistanceMeasure extends
WeightedEuclideanDistanceMeasure {
2.
3. public static final Vector WEIGHTS = new DenseVector(new double[]{10,
10, 5, 5 ,1/1000, 1/2,1/10});
4. public CustomWeightedEuclideanDistanceMeasure(){
5.
super();
6.
setWeights(WEIGHTS);
7. }
8. }
This class simply extends the Mahout provided WeightedEuclideanDistanceMeasure and sets the
custom weight vector that we mentioned above. This will make sure that when the algorithm runs,
the weighting will be applied to all the vectors from the input.
Now that we have generated our K centroids, it is time to run the actual K-Means clustering
algorithm. This is also very simple to run by using the Mahout provided classes:

1. hadoop org.apache.mahout.clustering.kmeans.KMeansDriver i
vector_seq_file/partm00000 c customercentroids/clusters0final
o customerkmeans dm
clustercustomers.mahout.CustomWeightedEuclideanDistanceMeasure x
10 ow clustering
In this command, we are specifying that we want to run the KMeansDriver hadoop job. We pass in
the input vector le again, and we specify the centroids le generated by canopy with the -c option.
We then specify where the clustering output should go, the use of the weighting mechanism again,
and how many iterations we want to do on the data.
Here's a quick overview of how K-Means actually works:
The K-Means Clustering algorithm starts with the given set of K centroids and iterates over
adjusting the centorids until the iteration limit X is reached or until the centroids converge to a point
from where they don't move. Each iteration has 2 steps and works the following way. - For each
point in the input, it nds the nearest centroid and assigns the point to the cluster represented by
that centroid. - At the end of the iteration, the points are averaged to recalculate the new centroid
possition. - If the maximum number of iterations is reached, or centroid points don't move any more,
the clustering concludes.

K-Means (and Canopy as well) are parallelizable algorithms, meaning that you can have many jobs
working on a subset of the problem and aggregating results. This is where Hadoop comes in for
the clustering execution. Internally Mahout and in particular KMeansDriver is build to work on the
Hadoop Map-Reduce infrastructure. By leveraging Hadoop's map-reduce proved implementation,
Mahout algorithms are able to scale to very big data sets and process them in a parallel way.
After generating this cluster, the next step is to create individual cluster les and to a single cluster
le with the simple syntax (cluster_id, customer_id).
This is done with the following map and reduce methods:
01. public static class ClusterPassThroughMapper extends
Mapper<IntWritable, WeightedVectorWritable, IntWritable, Text> {
02.
public void map(IntWritable key, WeightedVectorWritable value,
Context context) throws IOException, InterruptedException {
03.
NamedVector vector = (NamedVector) value.getVector();
04.
context.write(key,new Text(vector.getName()));
05.
}
06. }
07.
08. public static class ClusterPointsToIndividualFile extends
Reducer<IntWritable, Text, IntWritable, Text> {
09.
private MultipleOutputs mos;
10.
11.
public void setup(Context context) {
12.
mos = new MultipleOutputs(context);
13.
}
14.
15.
16.
public void reduce(IntWritable key, Iterable<Text> value, Context
context) throws IOException, InterruptedException {
17.
for(Text text: value){
18.
mos.write("seq", key, text, "cluster"+key.toString());
19.
context.write(key,text);
20.
}
21.
}
22.
23.
public void cleanup(Context context) throws IOException,
InterruptedException {
24.
mos.close();
25.
}
This has allowed us to obtain clusters of customers - next week's post will explore what can be
done with these clusters!

PART 2
Read the rst part of this hackathon implementation at Clustering our customers to get a full
background on what's being presented here!

Analysing the data


At the end of the rst post, we have obtained various clusters of customers in HDFS. This is the
most important part of the job. However, we still need to do something with those clusters. We
though about different possibilities: "The most common frequently asked question among members
of a cluster", "What is the average premium that people pay in a particular cluster?", "What is the
average time people are being insured with us in a particular cluster?".
Most of the analysis we dened here were straightforward aggregations, a job that Hadoop
map-reduce does nicely (moreso as the data is already stored on HDFS).
There are many ways to analyse data with Hadoop including Hive, Scalding, Pig, and some others.
Most of these do translations from a particular language to the native map-reduce algorithms
supported by Hadoop. I chose Apache Pig for my analysis as I really liked the abstraction it creates
on top of the standard Hadoop map-reduce. It does this by creating a language called Pig Latin. Pig
Latin is very easy to read and write, both for people familiar with SQL analysis queries and for
developers familiar with a procedural way of doing things.
Here I will show just one example, how to calculate the average premium for each cluster. Following
is the Pig program that does this:
01. premiums = LOAD '/user/cscarion/imported_quotes' USING PigStorage('|')
AS(rfq_id, premium, insurer);
02. cluster = LOAD '/user/cscarion/individualclusters/partr00000'
AS(clusterId, customerId);
03. customers = LOAD '/user/cscarion/aggregated_customers_text' using
PigStorage('|') AS(id, vertical, trade, turnover, claims,rfq_id);
04.
05. withPremiums = JOIN premiums BY rfq_id, customers BY rfq_id;
06.
07. store withPremiums into 'withPremiums' using PigStorage('|');
08.
09. groupCluster2 = JOIN withPremiums BY customers::id, cluster BY
customerId;
10.
11. grouped2 = GROUP groupCluster2 BY cluster::clusterId;
12.
13. premiumsAverage = FOREACH grouped2 GENERATE group,
AVG(groupCluster2.withPremiums::premiums::premium);
14.
15. STORE premiumsAverage into 'premiumsAverage' using PigStorage('|');
The previous Pig program should be fairly straightforward to understand.
The RFQs with their respective premiums are loaded from HDFS into tuples like (rfq_id,
premium, insurer)
The cluster information is loaded from HDFS into tuples like (cluster_id, customer_id)
The customers are loaded from the originally imported le into a tuple like (id, vertical, trade,
turnover, claims,rfq_id)
The RFQs are joined with the customers using the rfq_id. This will basically add the premium to
the customer collection.
The result of the previous join is joined with the cluster tuples. This essentially adds the
cluster_id to the customer collection.
Results in the previous join are grouped by the cluster_id
For each group computed in the previous step, the average is calculated.
The results from the previous step are stored back into HDFS as a tuple (cluster_id,
average_premium)
Querying the data from the main application
Now that we have the data and the analytics are passed to it, the next step is to consume this data
from the main web application. For this, and to make it as unobtrusive as possible, I created a new
server application (using Play) with a simple interface that was connected to the HDFS and could be
queried from our main application.

So for example, when a customer is lling the form we can invoke an endpoint on this new service
like this: GET /?trade=accountant&product=business&claims=2&turnover=25000&amployees=2
This call will vectorise that information, nd the correct cluster, and return the information for given
cluster. The important parts of the code follow:
01. def similarBusinesses(business: Business): Seq[Business] = {
02. loadCentroids()
03. val cluster = clusterForBusiness(business)
04. val maxBusinesses = 100
05. var currentBusinesses = 0
06. CustomHadoopTextFileReader.readFile(s"hdfs://localhost:9000/individual
clusters/cluster${cluster}r00000") {
07.
line =>
08.
val splitted = line.split("\t")
09.
userIds += splitted(1)
10.
currentBusinesses += 1
11.
12. }(currentBusinesses < maxBusinesses)
13. businessesForUserIds(userIds)
14. }
That code nds the cluster for the business arriving from the main application. Then it reads the
HDFS le representing that individual cluster and gets the business information for the returned
user ids.
To nd the cluster to which the business belongs to, we compare against the stored centroids:
01. private def clusterForBusiness(business: Business): String = {
02. val businessVector = business.vector
03. var currentDistance = Double.MaxValue
04. var selectedCentroid: (String, Cluster) = null
05. for (centroid < SimilarBusinessesRetriever.centroids) {
06.
if (distance(centroid._2.getCenter, businessVector) <
currentDistance) {
07.
currentDistance = distance(centroid._2.getCenter, businessVector);
08.
selectedCentroid = centroid;
09.
}
10. }
11. clusterId = Integer.valueOf(selectedCentroid._1)
12. selectedCentroid._1
13. }
The code that actually reads the Hadoop lesystem follows, it looks like reading a simple le:
01. object CustomHadoopTextFileReader {
02. def readFile(filePath: String)(f: String => Unit)(g: => Boolean =
true) {
03.
try {
04.
val pt = new Path(filePath)
05.
val br = new BufferedReader(new
InputStreamReader(SimilarBusinessesRetriever.fs.open(pt)));
06.
var line = br.readLine()
07.
while (line != null && g) {
08.
f(line)
09.
line = br.readLine()
10.
}
11.
} catch {
12.
case e: Exception =>
13.
e.printStackTrace()
14.
}
15. }
16. }
Then to return the premium for a particular cluster:
01. def averagePremium(cluster: Int): Int = {
02. CustomHadoopTextFileReader.readFile("hdfs://localhost:9000
/premiumsAverage/partr00000") {
03.
line =>
04.
val splitted = line.split("\\|")
05.
if (cluster.toString == splitted(0)) {
06.
return Math.ceil(java.lang.Double.valueOf(splitted(1))).toInt
07.
}

08. }(true)
09. 0
10. }
Any of these values can then be returned to the calling application - this can provide a list of
"similar" businesses or retrieve the premium average paid for those similar businesses.
This was it for the rst iteration of the hack! The second iteration was to use the same infrastructure
to cluster customers based on the location of their businesses instead of the dimensions used
here. Given that the basic procedure for clustering remains the same despite the dimensions
utilised, the code for that looks similar to the code presented here.
Published at DZone with permission of Carlo Scarioni, author and DZone MVB. (source)
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: Big Data

Theory

The Big Data/Analytics Zone is presented by Couchbase. See why successful companies are

moving beyond relational databases. Connect with Couchbase on Facebook, Twitter , LinkedIn
and Google +.

AROUND THE DZONE NETWORK


ARCHITECTS

JAVALOBBY

ARCHITECTS

JAVALOBBY

JAVALOBBY

SERVER

Top Posts of 2013: Big


Data Beyond
MapReduce: Goog...

Top Posts of 2013: The


Principles of Java
Applicat...

5 Things a Java
Developer Should
Consider This Yea...

Top Posts of 2013:


There Are Only 2 Roles
of Code

Singleton Design
Pattern An
Introspection w/ B...

Best Best Practices


Ever

YOU MIGHT ALSO LIKE

POPULAR ON JAVALOBBY

Forecasting the Next Year in Cloud

Spring Batch - Hello World

Fast and Scalable Clojure Ring Web Applications with Com sat

Is Hibernate the best choice?

Top 5 Java Performance Metrics to Capture in Enterprise Applications

How to Create Visual Applications in


Java?

D evOps Z one: Best of the Week (Apr. 26-May 3)

9 Programming Languages To Watch


In 2011

D ebugging Using Wildy and Arquillian

Introduction to Oracle's ADF Faces


Rich Client Framework

Internet of Things (IoT): Changing How We Live and Take Care of Business

Interview: John De Goes Introduces a


Newly Free Source Code Editor

8 Questions You Need to Ask About Microservices, Containers & Docker


in 2 015

Lucene's FuzzyQuery is 100 times


faster in 4.0

Hard Core: M icrosoft Introduces Windows 10 IoT Core, Continues


D edication to Internet of Things

Time Slider: OpenSolaris 2008.11


Killer Feature

Getting Excited About Your P roject With a News Headline from the Future

LATEST ARTICLES

Big Data Zone: Best of the Week (Apr. 2 6-May 3)

Googles mobilegeddon: Heres what


law rms need to know

Building Responsive Web Apps Using jQuery Libraries


TD D Gamication - Turning Test Driven Development into a Game
11 Reasons You Can' t Miss Jenkins User Conference 2015

Now Thats A Bright Idea: Cerner


Enhances 93,000-Member Jive-n
Community Using Ideation
How to Optimize Hibernate
ElementCollection Statements

Turn Down for What? PageS peed Service Comes to a Close

Cheap Software Dened Radio Will


Kill Bluetooth
Collecting Transaction Per Minute
from SQL Server and HammerDB
Locking Down the Cloud: 18 Security
Issues Faced by Enterprise IT
Spark and Cluster Computing

Topics

Camel Essential
Components
DZone's 170th
Refcard is an
essential reference to
Camel, an
open-source,
lightweight,
integration library.
This Refcard is
authored by...

Practical DNS:
Managing
Domains for
Safety, Reliability,
and Speed

What six blind men can teach us


about change

S oftware Testers Need to Constantly Learn New Technologies

DZone

SPOTLIGHT RESOURCES

Follow Us

Essential
Couchbase APIs:
Open Source
NoSQL Data
Access from
Java, Ruby, and
.NET

Search
Refcardz
Tech Library
Snippets
About DZone
Tools & Buttons

Book Reviews
IT Questions
My Prole
Advertise
Send Feedback

HTML5
Cloud
.NET
PHP
Performance
Agile

Advertising - Terms of Service - Privacy - 1997-2014, DZone, Inc.

Windows Phone
Mobile
Java
Eclipse
Big Data
DevOps

Google +
Facebook
LinkedIn
Twitter

"Starting from scratch" is


seductive but disease ridden

-Pithy Advice for Programmers

Anda mungkin juga menyukai