Prescribe computing metrics#
Value#
Many generic terms are heavily overloaded (reused) in the context of measurement in computing. This article attempts to restore meaning to these terms based on as many external references as possible, to at least achieve consistency in the context of this blog.
Cost#
Define a list of terms.
Pipeline#
See Pipeline (computing). Following that article, call these pipelines “serial execution pipelines” to distinguish them from true pipelines. The term “serial pipelines” is potentially confusing because all pipelines are connected in series.
Call a true pipeline a “multiprocess pipeline” or “parallelized pipeline” when it needs to be distinguished from a serial execution pipeline. Prefer the term multiprocess pipeline for e.g. Unix pipelines that run concurrently but not in parallel (e.g. on a single CPU machine). I’ve not seen the term “concurrent pipeline” in the wild but if needed it could cover both parallelized and multiprocess computing pipelines.
In practice, serial execution pipelines are often called a pipeline. This kind of pipeline could in theory be turned into a concurrent pipeline if the need appeared; if you only ever need one of something at a time the need is unlikely to appear.
If buffer size is not a concern (and losing cache locality) you could finish running each filter in a Unix pipeline one at a time or fully process every input unit one at a time and get the same answer and same execution time. Typically we choose to limit WIP with a Unix pipeline on a single CPU machine; this gets you first results faster and hides internals (allowing one liners to describe the whole pipeline).
An example of a pipeline is a machine learning model’s evaluation pipeline. The fundamental unit of
work is a visualization of model performance. Intermediate products are a dataset, a trained model,
the results of evaluation inference, and plain text evaluation results. Another example is the
pipelines defined in a .gitlab-ci.yml
file.
Process#
The term “process” is ambiguous (see Process). The unqualified term usually means either:
In computing the term “process” can also simply mean “something that takes up time”; in embedded operating systems what you think of as a computer process is actually called a task (see Multitasking and process management).
Even within a Unix or Linux context, the unqualified term “process” does not precisely identify the abstract computing process we are often talking about. Use the term “main process” or “child process” or “parent process” to distinguish a computer process from the “overall” process (usually identified with the main process). See:
Trace#
Notice that Tracing (software) has no link to Profiling (computer programming) and vice versa. In my opinion, the term “profiling” should be preferred to “tracing” when you’re talking about numerical statistics. You would do “tracing” to produce low level logging information (semantic/textual, not numerical). Hence, the trace logging level. In practice, logs contain a mix of semantic and numerical information. Turn on verbose logging to get “trace” level information; a trace is small (or detailed) because the phrase “didn’t leave a trace” implies it.
The confusion in the term likely originates from the fact that you can use “past evidence” to help you “draw out” how something works. See trace - Wiktionary.
Past evidence#
Define a “trace” as a log of execution of a process or a method, that is, a specialized log:
If tracing is a specialized use of the term logging, when should something be called a trace rather than a log? See the guidelines in Event logging versus tracing.
For example, strace
provides a specialized log (of system calls), and ltrace
a specialized log
of library calls.
Draw out#
See “Desk checking” and “Live debugging” in What’s difference between monitoring, tracing and profiling? - Server Fault. Don’t use the term this way; prefer “sketch” or “draw” to avoid the conflict in terms.
Profile#
Profiling is the process of running a profiler (on an application, system, or process). A profiler is a tool, and every profiler measures something different (see this Definition of debugging, profiling and tracing). Because of this, profilers tend to be specific to a language or the platform (see What’s difference between monitoring, tracing and profiling?).
When a profiler collects numerical/statistical data, it is called a “profile” of the execution. If a profiler collects plain text it will be called a “trace” or a log (see Profiling (computer programming)).
Deterministic (Event-based) Profiling#
These are often language specific; see Event-based profilers. In the article Off-CPU Analysis, this concept seems to be known as “Application Tracing”. See the article for an explanation of the fundamental difference between this approach and CPU sampling better. In particular, it explains some of the upsides and downsides of cProfile in Python.