We use the term concurrency to refer to the general concept of a system with
multiple, simultaneous activities, and the term parallelism to refer to the use of
concurrency to make a system run faster. (查看原文)
图5-22展示了做 k 次循环展开和 k 路并行变换的效果,k 最大为 6。我们可以看到,随着 k 值的增加,所有合并情况的 CPE 都增加了。对于整数乘法和浮点数运算,我们看到 CPE 的值为 L/k,这里 L 是操作的延迟,最高可以得到吞吐量界限 1.00。我们还看到使用标准的展开,整数加法也达到了这个界限。 (查看原文)
This book is written for a programmer's perspective, describing how application programmers can use their knowledge of a system to write better programs. (查看原文)
Building high-performance Web servers.Many Web servers generate dynamic content, such as personalized Web pages, account balances, and banner ads. Early Web servers generated dynamic content by using fork and execve to create a child process and run a “CGI program” in the context of the child. However, modern high-performance Web servers can generate dynamic content using a more efficient and sophisticated approach based on dynamic linking.
The idea is to package each function that generates dynamic content in a shared library. When a request arrives from a Web browser, the server dynamically loads and links the appropriate function and then calls it directly, as opposed to using fork and execve to run the function in the context of a child process. The function remains cached in the ser... (查看原文)
But if we had a 32-bit address space, 4KB pages, and a 4-byte PTE[page table entry, 杨注], then we would need a 4MB page table resident in memory at all time... (查看原文)
Hyperthreading, sometimes called simultaneous multi-threading, is a technique that allows a single CPU to execute multiple flows of control. It involves having multiple copies of some of the CPU hardware, such as program counters and register files, while having only single copies of other parts of the hardware, such as units that perform floating-point arithmetic. whereas a conventional processor requires around 20,000 clock cycles to shift between different threads, a hyperthread processor decides which of its threads to execute on a cycle-by-cycle basis. For example, if one thread must wait for some data to be loaded into a cache, the CPU can proceed with execution of a different thread. (查看原文)
integer representations can encode a comparatively small range of values, but do so precisely, while floating-point representations can encode a wide range of values, but only approximately. (查看原文)
When an operation is performed where one operand is signed and the other is unsigned, C implicitly casts the signed argument to unsigned and performs the operation assuming the numbers are nonnegative. As we well see, this convention makes little difference for standard arithmetic operations, but it leads to nonintuitive results for rational operators such as < and >. (查看原文)
Since program instructions are stored in memory and must be fetched(read) by the CPU, we can also evaluate the locality of a program with respect to its instruction fetches. ... for loop are executed in sequential memory order, and thus the loop enjoys good spatial locality. Since the loop body is executed multiple times, it also enjoys good temporal locality. (查看原文)
Processor macroarchitecture specifications often distinguish between asynchronous "interrupts" and synchronous "exceptions," yet provide no umbrella term to refer to these very similar concepts. (查看原文)