Stefan Edlich
Hadoop bersicht
Nach Google BigTable modelliert Anschub von Powerset / Yahoo Map/Reduce Abfragen
http://labs.google.com/papers/bigtable.html
HBase
Zeilen und Spalten (asso. Array) Vesionierung Zeile mit Schlssel identifiziert Trennzeichen :
Datenmodell
Nicht trivial POSIX UNIX Java 6 $ tar -xzf hbase-x.y.z.tar.gz $ cd ./hbase-x.z.z $ bin/start-hbase.sh Initial Single Node Cluster
Installation
Kommandozeilenoptionen
import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.client.*; public class FirstHBaseClient { public static void main(String[] args) throws IOException { HTableDescriptor table = new HTableDescriptor(Table); HColumnDescriptor family = new HColumnDescriptor(F1); family.setMaxVersions(2); table.addFamily(family); HBaseConfiguration config = new HBaseConfiguration(); HBaseAdmin admin = new HBaseAdmin(config); admin.createTable(table); HTableDescriptor tables = admin.listTables(); for (int i=0; i<tables.length; i++) System.out.println( tables[i].getNameAsString() ); } }
via java
public class FirstHBaseClient { public static void main(String[] args) throws IOException { HBaseConfiguration config = new HBaseConfiguration(); HTable table = new HTable(config, "Table"); Put p = new Put(Bytes.toBytes("FirstRowKey")); p.add(Bytes.toBytes("F1"), Bytes.toBytes("FirstColumn"), Bytes.toBytes("First Value")); table.put(p); } }
Insert = Put
public class FirstHBaseClient { public static void main(String[] args) throws IOException { HBaseConfiguration config = new HBaseConfiguration(); HTable table = new HTable(config, "Table"); Get get = new Get(Bytes.toBytes("FirstRowKey")); Result result = table.get(get); byte[] value = result.getValue( Bytes.toBytes("F1"), Bytes.toBytes("FirstColumn")); System.out.println(Bytes.toString(value)); } }
$ cp contrib/stargate/hbase-[version]-stargate.jar lib/ $ cp contrib/stargate/lib/* lib/ $ bin/hbase-daemon.sh start org.apache.hadoop.hbase.stargate.Main p <port> $ curl http://localhost:8080/Table/schema // TEST { NAME=> Table, IS_META => false, IS_ROOT => false, COLUMNS => [ { NAME => F1, BLOCKSIZE => 65536, BLOOMFILTER => false, BLOCKCACHE => true, COMPRESSION => NONE, VERSIONS => 2, TTL => 2147483647, IN_MEMORY => false } ] }
Skalierung
Vorteile
Skalierung Map/Reduce Community APIs
Nachteile
Aufsetzen / Optimieren / Warten (Cloudera?) Map/Reduce keine DB-Replikation