[已软注销]对《Programming Pig》的笔记(1)

[已软注销]
[已软注销] (Hello world)

读过 Programming Pig

Programming Pig
  • 书名: Programming Pig
  • 作者: Alan Gates
  • 页数: 222
  • 出版社: O'Reilly Media
  • 出版年: 2011-10-20
  • 第9页 Pig Philosophy
    Pigs eat anything Pig can operate on data whether it has metadata or not. It can operate on data that is relational, nested, or unstructured. And it can easily be extended to operate on data beyond files, including key/value stores, databases, etc. Pigs live anywhere Pig is intended to be a language for parallel data processing. It is not tied to one particular parallel framework. It has been implemented first on Hadoop, but we do not intend that to be only on Hadoop. Pigs are domestic animals Pig is designed to be easily controlled and modified by its users. Pig allows integration of user code wherever possible, so it currently supports user defined field transformation functions, user defined aggregates, and user defined conditionals. These functions can be written in Java or in scripting languages that can compile down to Java (e.g., Jython). Pig supports user provided load and store functions. It supports external executables via its stream command and MapReduce JARs via its mapreduce command. It allows users to provide a custom partitioner for their jobs in some circumstances, and to set the level of reduce parallelism for their jobs. Pig has an optimizer that rearranges some operations in Pig Latin scripts to give better performance, combines MapReduce jobs together, etc. However, users can easily turn this optimizer off to prevent it from making changes that do not make sense in their situation. Pigs fly Pig processes data quickly. We want to consistently improve performance, and not implement features in ways that weigh Pig down so it can’t fly.
    引自 Pig Philosophy
    2013-04-16 14:22:13 回应