Search the web#
Estimate value#
A specialization of Expand focus specific to the problem of finding external pedagogical materials.
Design test#
The result of this process should be a list of resources that score highly according to measure of both Relevance and quality. See Evaluate pedagogical tool for the âqualityâ scoring criteria used here. For other potential measures see Quality (business) and Data quality. See also Relevance (information retrieval).
Said another way (and simplifying to only a binary relevance metric), this process should provide higher precision searches because you are applying extra filters down to a higher quality subset of the internet. See Precision and recall - Definition (information retrieval context). See also Evaluation measures (information retrieval).
Leave negotiability#
Should your stopping point be a certain number of documents? It may be better to provide a range, so you can stop based on how the search is going. Was the last document you discovered better than the previous? How much better? How much extra time did it take to find?
You canât be sure a question, even if it matches your question word for word, is being asked in the same context or provides a sufficiently detailed answer. All results are only probabilistic; as you work through this limited set of search results youâll need to continue to decide which to focus on.
Thereâs always a balance between finding a resource that perfectly answers your question (high relevance), and a resource that is high quality. You should âscoreâ every resource by both its relevance and itâs general quality. How you weigh relevance and quality should depend on the relative confidence you have that the question youâre working on is more important than other related questions. When you were picking between different potential questions to answer, how did the question weights differ? Did you actually come up with the question because you noticed an article on it somewhere (found an interesting question while exploring)? Said another way, if you find a high quality resource that answers a related question, should you be opportunistic and read it instead?
How much you favor high-relevance documents to high-quality documents will depend on how focused your research is in the moment (more focused rather than basic/exploratory will favor high-relevance documents i.e. those you already understand). See Set goal. However, a high quality resource may still be a cheaper solution only because it provides better documentation.
A perfect system would provide you with a document that exactly answered your query, but that would be the equivalent of asking an educated person for the answer (expensive). Not only would the person need to address the specific question, theyâd have to answer it in words you understand. In some sense, theyâd be doing your job (sounds good, but rare).
Whatâs the relative value in web searching relative to reading? How much time should you spend searching relative to the time you spend reading? You arenât learning much when you web search; itâs a rather shallow process. You will typically get more from a conversation with a single webpage than a conversation with your search engine (i.e. put slightly more weight on high quality documents). For example, youâre often relieved to go e.g. on vacation and only be able read documents rather than weigh them.
Estimate cost#
See also Document retrieval.
Apply default filters#
We all know paywall sites that we ignore in search results because they may not even let you see what you want to see. Why not filter them to avoid wasting any attention on them? Use uBlacklist to block websites like this, or to filter websites youâve checked the license on using the process in Evaluate pedagogical tool.
Focus your search#
The web is a huge place, and trying to search all of it using only a few words is unlikely to produce the documents you are interested in. Why? Your search tool doesnât have enough information about how you score documents. Every word you type in your search is a filter in some sense, though in that case the terms can interact. The longer your query, the more likely youâll get what you want.
Ate you answering a question? If so, be sure to write out at least all the keywords in the question. You can skip generic words like âWhatâ and âWhyâ if you want, but these do help search.
See the âSearch Operatorsâ in DuckDuckGo Search Syntax | DuckDuckGo Help Pages for generic tips for improving your results.
Search by website#
Once youâve taken the time to build a solid query, reuse it. Use the same query on several websites you prefer to work with, listed here in order of preference.
Useful research websites in a rough priority order based on Evaluate pedagogical tool:
Stack Exchange#
See DuckDuckGo !Bang for one way to filter by website. The downside
to bangs is that they (often) use the scoring method provided by the website, which is typically
inferior to a general search engineâs algorithms (i.e. DuckDuckGo). If you donât want to type out
site:
(you must be quite busy) an alternative is to simply add the domain as a search term (e.g.
fully convolutional network stackexchange at DuckDuckGo. For example, try to find What is a
fully convolution network? - AISE using both !sx
and using the previous search; DuckDuckGo
helps you find this reasonably high quality resource much faster. Bangs you do not use, for this
reason:
!sod (SO with DuckDuckGo) or !ov (SO with it's own search)
!sx
You can also search SE by tag. See for example Highest scored âdimensionality-reductionâ questions - CV, or Highest scored âkeyboard-shortcuts+firefoxâ questions - SU to search by two tags.
Why stack exchange? Itâs also editable, and is CC BY-SA like Wikipedia (see What is the license for the content I post? - Help Center - SO). It also keeps track of your activity to some degree; see my answer in How can I view vote history from all Stack Exchange sites? - MSE however. Try to make a comment or add a bookmark when youâve made a significant investment in understanding a question and answer; this way you can close the tab but also get back to it quickly days later if it connects to another topic (but otherwise forget it). If you have many links you want to provide, consider creating your own answer. See the sxda bookmark for [User davidvandebunte
Stack Exchange - activity](https://stackexchange.com/users/309544/davidvandebunte?tab=activity).
Wikipedia#
!w
!wt
Why Wikipedia? Itâs editable and CC BY-SA. Not only that, itâs seen as authoritative. Different authors may provide different definitions, and it either picks one or discusses the alternatives. When it does pick one, itâs likely that youâll see authors converge to Wikipediaâs definition, in the same way that natural language evolved by âvoteâ (a word means what it means to others). To look up a page you recently edited see User contributions for Davidvandebunte - Wikipedia.
A simple way to improve an article is by adding a citation. If this isnât possible, consider adding a Wikipedia:Citation needed or a similar simple tag.
Papers with code#
!pwc (uses the search built into paperswithcode.com)
Use Papers With Code (pwc
) to find the SOTA on the benchmark that
most closely matches your workload. Even if you donât use the SOTA because itâs e.g. too expensive,
the list should include all models evaluated on the dataset.
Programming references#
!py
!sympy
!pytorch
!wolf
Personal bangs#
!maps
!gcal
!drive
!e
!a
Explore by website#
Have you found âgoldâ on certain Websites in the past? You may want to mine them by e.g. their own scoring criteria or by simply browsing whatâs available.
One major downside to this approach is that you arenât coming to the material with your own question, asking what you want of it (fitting it into your own knowledge) rather than having it tell you what is valuable or useful. By definition, itâs also widening your focus, taking you away from whatever you were currently working on. It can also be hard to remember when you last âminedâ a website; what is new?
Some of these âblogsâ are also written like advertisements. They are designed to cause you to panic; the authors talk as if they are about to hit AGI because thatâs what they want their stakeholders to think. If you see impressive results investigate how they achieved it, and youâll be less impressed.
Still, donât hate their success. They are successful because they are taking the time to demo, unlike many authors. Theyâre learning fast. Reading what theyâre doing has been one great way to get yourself to stop maintaining large bodies of your own notes; youâve often seen them doing work youâve already been considering (such as writing source code) and their advertisements drive you to learn new topics (e.g. Transformers). To some extent these sites have earned your priors, like a brand.
Semantic scholar (!s2
uses the search built into semanticscholar.org) only provides articles that
arenât behind a paywall. This site makes it easier to follow references in papers by turning them
into links. See:
Some websites donât seem searchable via DuckDuckGo; you must go to them specifically. Use Connected Papers to find high-scoring papers that are closely related to a topic you are already focusing on. Click on papers to highlight them in the graph; zoom in so nodes donât overlap so much. You can only build 5 free custom graphs (with multiple origin papers) on this site; see Connected Papers | Pricing.
Other resources:
Limit the number of websites you regularly visit, to make relearning from them easier and focus your investments. To learn to use a website is an investment in itself, such as e.g. learning how to insert a reference on Wikipedia, where to find activity history, or how to construct links to specific answers on SE. If you need to relearn a topic, it will help to be able to relearn from an easily accessible website (one that shows up quickly in search). Websites even have certain layouts that you get used to and therefore lead to less âsurpriseâ when you go to a completely new page on the same site. That is, youâll hit a memory-guided Saccade more often.
Wikipedia#
The equivalent of high scoring articles on Wikipedia is âGoodâ or âFeaturedâ articles. Itâs much harder to judge which of these are worth reading, and thereâs a higher barrier to entry for influencing how an article is ranked (who knows how?). See for example:
Stick to your keyboard#
Use H
and L
rather than the left and right arrows on your keyboard so you donât lose your handâs
home position.
Copy text#
One option is to use vimiumâs /
then n
(see [Find Mode | Vimium][fm]) to searching with Ctrl-F
(or F3
). The copying process:
After reaching the text, hit
v
to enter [Visual Mode | Vimium][vm].Use any of the characters mentioned in the documentation (e.g.
w
,e
) to select text.Hit
y
to copy.
Hit N
to go back in your search; n
does not wrap back to the top. Use c-[
rather than Esc
with Vimium. You can also hit v
then use /
to search (opposite order). The downside to this
approach is vimiumâs search doesnât highlight all instances of a word.
You can also use F7
to enable âCaret browsingâ in Firefox (see Keyboard shortcuts - Perform
common Firefox tasks quickly | Firefox Help). Vimium also has a caret mode where you can use
keys from vim rather than the arrow keys. Hit c
from visual mode.