Chapter 4
-pig data types
-divided into both scalar and complex types
-all represented by java.lang interfaces
-int, long, float,double, chararray, and bytearray
Complex types:
-casts go in parenthesis
-casts to bytearrays never allowed but bytearrays to anything else
is
-casts to/from complex types not allowed
PigStorage/TextLoader-support globs.
Globs-read multiple files that are not in the same directory.
Version specific
Escape many of the globs to prevent the CLI from expansion
of them
Store
Use store to save file on HDFS. Default tab delimited file
UDFs
-use with foreach statement
Filter
versus
Order by
Join
-
Limit
-allows you see a limited # of results
Parallelism
-PYTHON Udfs
-python script must be in the current working directory
Must use
-throws IllegalArgumentException
Chapter 6- Advanced pig latin
-flatten
-Joins
-pig allows user to define any implementation of join via
using clause
Join small to large dataset
Fragment-replicate join- basically fragment one file and
replicate the other
-uses only a map task.