# Improve SSC

## Contents

# Improve SSC#

Perhaps greater and less than is so important to humans because we often think in terms of gravity. Consider saying above and below or âas high asâ rather than the other language when youâre stuck. To think in terms of preorders rather than linear orders is then to think in another dimension - left and right - as well. Two things can be the same height but not be âcomparableâ because they are at different places across left and right. It seems like it all comes back to our 3-dimensional thinking. Are meets the way to âgo downâ and joins the way to âgo upâ in this view? If you âwantâ to go up, do you use the join (i.e. addition or the logical or). In Cost, is this why we have to reverse the order? Why is I < in the definition of a V-cat?

Does even forming a sentence require thinking ahead (creating an action graph)? You sometimes write the first few words of a sentence while half-thinking, and then need to erase it when you realize what you actually want to say. You often try to consider all the possible responses someone could give to a text before you write it; the more you plan ahead the more likely youâll be able to get them to respond in a way thatâs OK with you. In some sense, talking is publishing.

You often want to redo old questions rather than checking the answer. Why? Do you want to confirm you remember all the dependencies that led to the result? If you wanted to rederive every result, then you wouldnât be reading books written by others (essentially taking their answers). You also wouldnât maintain any notes; youâd prefer to rederive the results from scratch regularly. The point of writing down the answer was for you to be able to refer to it later, and if you never refer to it the effort you put into writing down the answer was mostly wasted (at least with respect to you).

To some extent youâve even memorized the numbers associated with definitions are part of reading this book (e.g. Definition 2.46). Youâve also likely memorized the location of results on pages. The location of results on pages is why many books always start chapters on only odd-numbered pages (so results stay on the same side of the page through minor edits of other chapters). Hence, you really donât need to publish what youâve changed back to the source.

You can read commutative diagrams like geographic maps, where the map simply repeats itself in many places. Think through one example of where the commutative diagram would apply, and youâll likely understand the pattern.

## Replace references with web links#

Your references are broken anyways, so why not replace them with web links? Thatâs what youâd prefer anyways. Itâd also make your PDF better than the original, in your opinion, and so useful to others (worth sharing).

## Boxes in boxes#

A drawing of âboxes in boxesâ could look quite different from the side.

If you think of this as a presentation of a category then is this dependent types? A set of a certain size.

## Filters as graphs#

Is there a âfilter frameworkâ in the public domain? It doesnât seem so, but see:

Can you see Unix filters as defining a DAG, where the functions are the commands and the arrows in between are sets? That is, files are âsetsâ of lines and the filter runs on all lines in the set. The boxes are labeled with unix commands. We use âteeâ to get two arrows from one output.

## Document how to visually take product orders#

Add to Product order. You had to learn the hard way how to do this for non-total orders: see `pip-feasibility-relation.svg`

for the start of how you learned to do it visually.

## Monotone maps#

Consider parallel arrows, two perspectives on the same thing. What metrics are you tracking in your model? Is there a monotone map between them? For example, from F1 to an ILE. Is there a monotone map from the loss to the metrics you are measuring and care about?

The functor from your measure of a moduleâs performance to your overall performance should ideally be monotonic. If it isnât, you wonât be able to check module-level performance and be confident that it will correspond to system-level performance. If not then you have at least a non-ideal compression, a compression that is not just lossy but lossy but in the wrong way. Said another way, a compression can be âwrongâ if it doesnât measure what you are interested in achieving at the next level up.

This is similar to how, when you need to start improving software, you should start by running it at the highest level thatâs reasonable (or better, one level above what you need to improve/change). Youâll be much more immune to getting bogged down in details if you understand the big picture. If you do this, then youâll understand to what degree your module-level performance is a monotonic map to overall performance (what confidence you can have in the map).

If you donât preserve at the module-level the aspects of the problem you care about (at the most basic, preserving order), then you may have trouble (depends to what degree the map is not order-preserving).

Can you try to bite off too much before attempting to make changes? That is, start to make changes in a highly complex network when all you understand is how to run e.g. training (e.g. an object detector). No, generally speaking, but that assumes that others havenât already tried to optimize the settings you are tweaking. Said another way, if all you have is âhyperparametersâ (settings you donât understand) then you and everyone else will be equally capable of checking performance (assuming your computer resources are the same). You can see humans as computers to get more done, too.

Consider breaking down your single arrow into two or more arrows in series. This corresponds to trying to gain more insight into a model through either measuring an intermediate, or reading the code for intermediates and writing tests around them.

Consider the implications of this for new developers, as well. Ideally we start them at the highest level we consider and they work down to some specific task. If that will take too much time, though, we need to give them some specific task within the larger network, and they will need to trust the optimization task theyâve been given has value at the next level up.

## Start with evaluations#

Should you start with improving metrics in many nets? If you donât have the feature youâre measuring in the loss then any achievements you gain may be temporary if the model changes in other ways (no monotone map from the loss to our metrics). But, itâs good to start with a test before adding the training feature so we know what difference weâre making if we were to include it. Once you have the test, then you can work on fixing the map incrementally.

You can also start by measuring something someone else didnât and then argue itâs more important than the aspects they had optimized for (e.g. because of changing requirements).

## What is sound and complete?#

To understand these terms, youâre likely going to need to understand some logic (i.e. calculus). Propositional calculus is the simplest; see Propositional calculus - Soundness and completeness of the rules to prove these properties on it. As mentioned in Soundness, it seems likely this completeness here is not the same as in GĂśdelâs incompleteness theorems.

## Commutative diagrams#

Consider the original commutative diagram, what you think of when you think of the Commutative property:

We assume this expresses something along the lines of \(ab = ba\). But is \(ab\) the 2 top-right arrows, though, or the 2 bottom-left arrows?

It depends on whether you put â¨ or â in the middle of \(ab\). If you put â¨ in the middle, then \(ab\) is the top-right arrows. That is, this depends on your convention for composition.

When you posed the question, you implicitly meant Ă (multiplication); you could have easily meant + (addition) as well. In both cases, you assume a symmetric/commutative operator and so confuse yourself about which is which. You need to go back and forget that these operators are commutative; only then can you ask if they are commutative. If you do that then you would replace the operator Ă/+ by e.g. Âˇ (as in Monoid) or something else that doesnât imply commutativity to you.

Itâs likely that Âˇ is a poor choice most of the time, however, because it does imply commutativity to many people (just as concatenation did when you posed this question). It also doesnât provide any sense of direction (only that \(ab \neq ba\)) if you want to also interpret the operator as a composition operator. You could come up with your own convention, but that could easily lead to confusion. In Visual Group Theory (page 85) the author confusingly defines fÂˇg and fâg to both equal fâ¨g; this is confusing because Wikipediaâs convention is now that fâg is the opposite of fâ¨g (so fâ¨g = gâf).

Said another way, you âknowâ that \(aâ¨b \neq bâ¨a\) and \(aâb \neq bâa\). The Âˇ operator is more ambiguous and so you have to check the context.

## Arrow categories#

## Compression and preservation#

Notice the language of models and morphisms between them in Strict 2-category - Doctrines. You should often see morphisms as data, as in higher category theory. This is strangely similar to your âimprove improveâ documents as well; you treat a âprocessâ as data and construct a second process (an âimproveâ process) to transform it. You can call of these processes, n-cells, or n-morphisms (the word doesnât matter).

As discussed in Chp. 1, most of our models are compressions of the natural world. In our brains we maintain causal compressions of the much more complicated computing engine that is nature. In computers we maintain causal compressions of the models in our brain. We expect a CNN to preserve at least some features of our visual cortex:

This compression is always lossy, but can we choose which losses to take? What do you know about your model in nature? Thatâs what you want to preserve in your mental/computer models.

What *can* we preserve now? We discussed preserving joins and meets in Chp. 1, which seems like something youâd want to preserve in almost any model compression. It also seems quite natural to want to preserve composition, as we discussed in Chp. 3 around functors (e.g. preserving order with monotonic functions).

A Linear map preserves vectors addition and scalar multiplication.

In image processing, see other examples of preservation in machine learning - What is translation invariance in computer vision and convolutional neural network? - Cross Validated. In e.g. topdown lidar imagery someone might be interested in rotational equivariance.

### Classification#

Draw (1.5) mapped to the bools. This whole example could be seen as a classification task. For classification, we typically want translation invariance: moving an object in an image should not change the imageâs classification.

### Object detection#

For object detection we typically want translation equivariance; see Equivariant map. Said another way, this preserves the operation of symmetry transformations. Recognize that a model is different with different inputs applied to it; these are not the same and we can identify at least part of the map between them:

### Adding software complexity#

When we add new code to existing software, we often want to maintain all our previous features. Sometimes this is as simple as adding an if statement and handling new cases, directly adding more code paths. That map between the old and new software is obvious in this case, but what if thatâs not enough? Consider adding an FPN, or converting a single-frame model to multiple frames.

To refactor means to maintain the same behavior while changing the code to support new features. Version control tracks your refactoring. Every commit is a map from the old software to some new version that is âbetterâ in some way, which typically means not losing features. Itâs critical to have this history when some feature is lost (a bug/defect, specifically a regression). A regression is always associated with some commit, a map between the old and new software. You canât test everything; these maps are critical to understanding what happened when something breaks.

Smaller commits makes verification that these maps are as expected easier. That is, incremental changes may *seem* to be a slower path to your goal, but itâs often the case that doubling the size of a step is more than twice as expensive as two smaller steps to verify (for other developers, and oneself). We can also run into working memory limitations if we make our steps too large.

## Composition of relations and matrix multiplication#

How are these related? Youâre familiar with both, but it seems like they sometimes express nearly the same thing. See:

https://en.wikipedia.org/wiki/Composition_of_relations#Composition_in_terms_of_matrices

https://en.wikipedia.org/wiki/Binary_relation#Matrix_representation

A simple example with 2Ă2 matrices that shows a connection to the notation of quantales:

The connection to categories:

## Section 4.5.2#

*Exercise* 4.65.

In the paragraph above this question, the author is defining **1** to have a single object 1. Every morphism in a \(\mathcal{V}\)-category must be also be assigned an element in \(\mathcal{V}\), so he also assigns to the single morphism (the identity morphism on 1) the object *I*.

A đĽ-profunctor \(\rho_\mathcal{X}: \mathcal{X} Ă \textbf{1} â¸ \mathcal{X}\) is a đĽ-functor \(\rho_\mathcal{X}: (\mathcal{X} Ă \textbf{1})áľáľ Ă \mathcal{X} â \mathcal{V}\). Because \(\mathcal{X}\) is enriched in \(\mathcal{V}\), we know that \(\mathcal{X}({x,y})\) is an object of \(\mathcal{V}\). Letâs define \(\rho_\mathcal{X}(x,1,x') := \mathcal{X}(x,x')\) (an isomorphism as required).

Similarly, a đĽ-profunctor \(\lambda_\mathcal{X}: \textbf{1} Ă \mathcal{X} â¸ \mathcal{X}\) is a đĽ-functor \(\lambda_\mathcal{X}: (\textbf{1} Ă \mathcal{X})áľáľ Ă \mathcal{X} â \mathcal{V}\) defined \(\lambda_\mathcal{X}(1,x,x') := \mathcal{X}(x,x')\).

*Exercise* 4.66.

How does this selection of a dual make sense in terms of \(\textbf{Prof}_\textbf{Bool}\)? What does the categorical product \(\mathcal{X}^{op} Ă \mathcal{X}\) look like?

Every object in \(\textbf{Prof}_V\) should have a dual. Writing the relevant morphisms as \(\mathcal{V}\)-profunctor:

Writing the relevant morphisms as \(\mathcal{V}\)-functor:

## Algebraic structures#

See:

Hereâs an example to anchor off of (could update to File:magma to group.svg as well):

In Algebra [cats] and Outline of algebraic structures the same information that this preorder (File:Magma to group4.svg) expresses is provided in a table. Still, the colors of the arrows are helpful to indicate what structure is being added. That is, a typical preorder only has booleans (an arrow or not) between elements, while this colored drawing allows for 3 (and in general more) properties to be quickly recalled.

You could see the nodes/objects as categories and the arrows as coming from a set of algebraic features. That is, this presents the category of small categories enriched in certain algebraic features. You could add an arrow for commutative (selected yellow below) and add a Commutative monoid to the drawing. This is definitely collecting butterflies; you should also consider which structures are important or common.

In this context a colored arrow means *adds* some structure specific to the color. For example, in File:Magma to group4.svg a Monoid with invertibility is a Group. Follow the arrows in reverse for an âis aâ relationship followed by âwithâ; e.g. a Group is a Semigroup with invertibility and identity.

Itâs no mistake that this drawing is of a partial order with 8 elements. We could construct it by taking the product of 3 two-element partial orders, with each two-element partial order representing the addition of some property/adjective (a checkbox in the table representations).

### Structure, space, or algebra?#

Although all these words show up regularly in the study of algebraic structures, they are essentially synonyms in the sense that they are all âstructuresâ in the sense of being a set or sets with an operation or operations defined on them (the article Mathematical structure limits the definition of structure to being defined on only one set). Donât be bothered by the inconsistency in language that often occurs. Weâll use AS for Algebraic structure in the following, since itâs becoming the anchor word.

See also Structure (mathematical logic).

#### Space vs AS#

Why do we have all these nearly equivalent words? The terms âspaceâ and âstructureâ exist alongside each other primarily for historical reasons; see Space (mathematics). In practice this means youâll see the term âspaceâ more often when a discussion turns to geometric rather than algebraic concerns. We are straddling this line in Vector (mathematics and physics) with the quote:

A vector space formed by geometric vectors is called a Euclidean vector space, and a vector space formed by tuples is called a coordinate vector space.

Youâll see the same Euclidean/coordinate (i.e. space/structure) language explained in Real coordinate space.

Similarly, the article Group action feels the need to awkwardly state âspace or structureâ early on and often switches between the words.

#### Algebra vs AS#

The article Commutative ring states that commutative algebra is about the study of commutative rings, which are typically described as algebraic structures (not algebras). This is because the term âalgebraâ without an article refers to a âfield of studyâ in mathematics; see Algebra (notice this article is written in 162 languages). That is, the term âalgebraâ in this context is useful to avoid the verbose âfield of studyâ or âbroad part of mathematicsâ someone would otherwise need to use. Why not use the word âmathâ though? As in âlinear mathâ (linear algebra) or âabstract mathâ (abstract algebra)? Most likely, only to sound fancy.

This language gets especially confusing in cases like the following from Algebra:

Sometimes, the same phrase is used for a subarea and its main algebraic structures; for example, Boolean algebra and a Boolean algebra.

With an Article (grammar), an Algebra over a field is simply an example of an algebraic structure. These are the most âstructuredâ or complicated algebraic structures because they stack on top of vector spaces even more operations. The fact that an Algebra over a field is the starting point on Wikipedia is a bit unfortunate because it seems like most examples easily generalize to an Algebra over a ring (e.g. Associative algebra).

Why do we often use the letter \(K\) for a field rather than \(F\)? One possibility (or at least a mnemonic) is that the âcâ in âvectorâ sounds like K, and a âvectorâ space is defined over a field. The (historical) language of âalgebra over a fieldâ may be related to this: the concept of an âalgebra over a fieldâ is an extension of a vector space, which seems to often be conflated with the concept of a field. We could read this as an âalgebra over a vector spaceâ instead, and itâs likely most people would understand what was meant (though this is even more verbose).

### Two binary operations#

These are the algebraic structures with âOne binary operation on one setâ; can you do the same for structures with two operations? Starting with the same colors as File:Magma to group4.svg (RGB), but with dark colors for multiplication (Ă) and light colors for addition (+). Weâll expand on the operations in Algebra [cats].

The corresponding category is marked to the bottom left of some defintions, where youâll find the rules for preserving structure between different examples (in the definition of the homomorphism). The arrows between these categories indicate âfull subcategoryâ rather than the addition of some property/constraint/structure.

Start with the smallest possible examples (2-3) in every case (e.g. â¤/4â¤), so that you can potentially provide drawings. Also as part of avoiding infinite. You should strive to provide non-trivial examples, though, which makes this harder than simply providing the smallest possible example.

If you take âis aâ to mean âhas inside it all examplesâ then youâd get a bunch of examples of everything by just going up the âis aâ chain. Itâs a âsubset ofâ relationship as well.

When you link to an example, link to the longest possible explanation of the example you can find (not just where you originally found it).

Prefer the term âNoncommutative ringâ to be more specific, since in some contexts âRingâ may imply a commutative ring. Prefer the term âSemiringâ to âRigâ only because the former is more common and consistently used on Wikipedia; itâs also much easier to quickly search for (ârigâ is a part of many words, requiring whole word search). But see semiring in nLab and rig in nLab.

We âgeneralizeâ when we go from thinking about specific examples in a category to thinking in terms of the category (what structure all the examples have in common), and we âgeneralizeâ when we remove property/constraints/structure (following arrows in the reverse direction).

Notice youâve already started on the âOne set with no binary operationsâ diagram in Exercise 5.10, with FinSet, FinRel, etc.

Said another way, the âis aâ relationship can hold because one object has more constraints (more structure) on it than something it is an instance of (Y âis aâ X because all examples of Y have more structure than all examples of X). The âis aâ relationship can also hold because Y is simply an example of X (not thinking of the potential many examples of Y and X). We could think more deeply and put boxes in boxes; but between the examples in separate boxes there is presumably no relationship (no morphisms) unless you go up to a common level and use the morphisms at that level. That is, use e.g. prop functors, monoidal functors, or functors to preserve what structure the two objects do share.

### Properties of categories#

Consider the following diagram, now one level âupâ in the sense of thinking about properties/constraints/structure you can add to categories. But is it part of **Cat** if it includes **Cat**?

It could be drawn similar to the algebraic diagrams above. In both cases we are adding something with all of our arrows, whether we are assigning new properties or assigning arguments (calling constructors, so to speak). In the first case we add a property to all examples in a set/class (âmodifyâ the set/class relative to its previous definition); in the second case we add a property to only an element (âmodifyâ the element relative to its previous definition).

This drawing assumes some categories (e.g. **Rel**) are only defined as monoidal categories in one way. In fact, there are often multiple ways to define a category as monoidal (different options for the monoidal product). It should include what monoidal product is being used in its examples.

The link https://en.wikipedia.org/wiki/Compact_category redirects to Autonomous category. Which term do you prefer? With rigid, it seems there are three now:

For more on the term compact, see machine learning - Can neural networks approximate any function given enough hidden neurons? - SO and Compact space.

This diagram started as a list of common Enriched category.

### All examples#

Be careful with the word âexampleâ (which shows up all over these notes). Is a collection of examples an example of something? If youâre using the word analogously to âelementâ in a set, then youâre going to run into Russellâs paradox if you consider a collection of examples as an example in some other collection of examples.

Instead, invent words to create collections of collections of examples. Call them sets, categories, collections, Class (set theory), Conglomerate (mathematics), etc. until youâre sick of coming up with words. Or start using 0-category, 1-category, 2-category, etc. as in n-category. From that page:

Especially as n increases, there is a plethora of different definitions of n-categories, some differing in generality others different-looking but secretly equivalent. A (woefully incomplete) list is given below, with pointers to dedicated entries. Part of the subject of higher category theory is to understand, organize, systematize and, last not least, apply these definitions. (It is the ânâ in ân-categoryâ that gives the nLab its name.)

However, the definitions do seem to agree that a 0-category is a set, and a 1-category is a regular category.

In SVG you should link boxes to the word you want to be using for the collection (boxes can be linked, just like text). You could even provide a separate TOC, and color boxes (perhaps shades of gray) for the word that should be used with them.

It seems like the word âcollectionâ is the most informal and therefore a good default (for unlinked boxes). See:

Perhaps a âcollectionâ is something that must be constructed one-by-one, not defined via properties. That is, the word âexampleâ may be appropriate for a collection. Of course, what makes an example belong to your collection? If you donât have a rule, you could put anything into it. In some sense this is what gray boxes are; thereâs only a notion that these belong together.

See Is there a category of categories?. You could draw a large CAT box around your whole drawing, but it probably wouldnât be helpful. Similarly you could draw a large Algebraic structure box arond a large section of these notes, because this term is again rather general.

Many web APIs use tags (think hash tags in twitter, labels in Gmail, tags for cost-tracking in AWS) to make it quick and easy to put new elements into a set. The Web API then provides a single list of tags (i.e. types in a type system).

It seems like you are running into nearly the same issue when organizing into directories: do you see each directory as a set? You can make everything much simpler if you have a single namespace, as Jupyter Book suggests that you do. Why not use JB tags to avoid directories?

### Managing definitions#

Working through this exercise, itâs clear that there are so many definitions youâre going to struggle to get them all on one diagram. As discussed in Ring (mathematics), many authors define a ring differently depending on the context. And why not? These definitions (like any structure) should only be evaluated relative to your objectives.

Even if you *wanted* to define global terms, the terms are going to get incredibly long as you add more and more adjectives. Youâll need more and more adjectives (or alternatively, to invent new words) only because your namespace is going to fill (requiring rework of your drawing, as well). Itâs critical to delete/archive code for the same reason. Still, you need a large vocabulary to be able to interpret as much as possible (as long as youâre given context). Your notes are your own vocabulary; you may need to invent new words (or make longer words) in them so you can actually pull together logic that was previously in different contexts.

For example, prefer the term âNoncommutative ringâ internally to be more specific, though this will usually correspond to the unadorned âRingâ when you encounter that term.

### Major connections#

### Other#

Consider finding the same in a programming language as well; see:

Perhaps you still need your own in your own notes, to include e.g. Quantale? In your own notes you want to see what youâve understood in the past so you can think about what might be easy to construct from what you already know. In general, this is true for most drawings (you only include on them what you understand).

How do you use categorical logic in Python? Much may need to be custom; see Computational Category Theory in Python I: Dictionaries for FinSet | Hey There Buddo!. However, this mentions the promising Welcome to Hypothesis! â Hypothesis 6.71.0 documentation. See also Computational Category Theory in Python III: Monoids, Groups, and Preorders | Hey There Buddo!.