They differ significantly from traditional datacenters: they belong to a single organization, use a relatively homegeneous hardware and system software platform, and share a common systems management layer.
Most importantly, WSCs run a smaller number of very large applications(or Internet Services).
The relentless demand for more computing capabilities makes cost efficiency a primary metric of interest in the design of WSCs.
However, network switches with high port counts, which are needed to tie together WSC clusters, have a much different price structure and are more than 10 times more expensive (per 1Gps port) than comodity switches.
每个 rack 一个低端交换机，上面一个高端的核心交换机。
A switch that has 10 times the bi-section bandwidth costs about 100 times as much. As a result of this cost discontinuity, the networking fabric of WSCs is often organized as the two level hierarchy despicted on Figure1.1
In such a network, programmers must be aware of the relatively scarce cluster-level bandwidth resources and try to exploit rack level networking locality.
one can remove some of the cluster level networking bottlenecks by spending more money on the interconnect fabric. for example, Infiniband interconnects typically scales to a few thousand ports but can cost $500~$2000 per port.
现在该 clos 布局弄的比较多，局部性是需要适应的现状，也可以看做一个可以解决的问题。
Alternatively, lower-cost fabrics can be formed from commodity Ethernet switches by building "fat tree" Clos network.
A large application that requires many more servers than can fit on a single rack must deal effectively with these large discrepancies in latency, bandwidth, and capacity.
these discrepancies are much larger than those seen on a single machine, making it more difficult to program a WSC.
key challenge for architects of WSCs is to smooth out these discrepancies in a cost efficient manner. conversely, a key challenge for software architects is to build cluster infrastructure and services that hide most of this complexity from application developers
Although this breakdown can vary significantly depending on how systems are configured for a given workload domain, the graph indicates that CPUs can no longer be the sole focus of en- ergy efficiency improvements because no one subsystem dominates the overall energy usage profile. .
Typical Internet services exhibit a large amount of parallelism stemming from both data- and request-level parallelism. Usually, the problem is not to find parallelism but to manage and efficiently harness the explicit parallelism that is inherent in the application
A beneficial side effect of this aggressive software deployment environment is that hardware architects are not necessarily burdened with having to provide good performance for immutable pieces of code. Instead, architects can consider the possibil- ity of significant software rewrites to take advantage of new hardware capabilities or devices.
Homogeneity within a platform generation simplifies cluster-level scheduling and load balancing and reduces the maintenance burden for platforms software (kernels, drivers, etc.).
Ideally, the cluster-level system software should provide a layer that hides most of that complexity from application-level software, although that goal may be difficult to accomplish for all types of applications.
简化复杂度的地方：1. 不必挖掘并行；2. 基础组件同构；增加复杂度的地方：1. 需要日常地应对硬件故障；2. 访问模式异构；
Although the plentiful thread-level parallelism and a more homogeneous computing platform help reduce software development complexity in Internet services compared to desktop systems, the scale, the need to operate under hardware failures, and the speed of workload churn have the opposite effect.
Once the diverse requirements of multiple services are considered, it becomes clear that the datacenter must be a general-purpose computing system.
There is opportunity here to find solutions by software- hardware co-design, while being careful not to arrive at machines that are too complex to program.
The most cost-efficient and balanced configuration for the hardware may be a match with the combined resource requirements of multiple workloads and not necessarily a perfect fit for any one workload
Fungible resources tend to be more efficiently used.