Ways to search stuff on the net

Topic: Ways to search stuff on the net (Read 3342 times)

Ways to search stuff on the net

2014-03-06, 08:17:52

In the early days of Google, the saying "Google is your friend" became pretty much a proverb. Google actually still is good for finding the most varied things on the web.

Sometimes I need to find search results on a specific website. The website may have its own search function, but it works like crap. This is when I resort to e.g. site:thedndsanctuary.eu search terms in Google.

There's a problem of principle with internet search engine. They don't show the real web. They show the database accumulated by their own web crawlers. Where the crawler has not been, that you don't see. This problem applies to all search engines, including Google. And the visible results are listed as per some filter rules and algorithms that may or may not be transparent. Still, web search engines mostly work.

There used to be a network called Gopher. It had its own search function. When you search Gopher, what you see is Gopher results, not the results of where some web crawler has had time or desire to visit. Gopher is used for digital library catalogues here. Btw, Opera browser supports Gopher protocol. Awesome.

Then there are web search combines or combiners. The one I used to know and worked with when on Windows was Copernic. Basically how it worked was that it included a bunch of search engines (like browsers include these days) and it performed searches in all of them at the same time and then listed the results in a pleasant colourful list. Fantastic.

How do you do? Do you look at results in multiple search engines to give yourself the sense of doing proper research or are you the "I'm feeling Lucky!" type?

Re: Ways to search stuff on the net

Reply #1 – 2014-03-06, 09:56:37

You know what your problem is? You use too many letters.
My advice is to curb it somehow - like keywords in a search query.

Re: Ways to search stuff on the net

Reply #2 – 2014-03-06, 10:01:59

A fair amount of stuff I need is hidden behind paywalls (which I access through VPN) or only available in paper form. Still, Google Scholar requires an almost obligatory look or two to see if it has anything of interest.

Most of my searches are in specialized resources and directories, even if that means something as (seemingly) basal as Wikipedia. What's the point of searching in Google if the first or second result is Wikipedia anyway, and the other sources are likely to be worse? Wikipedia articles tend to have plenty of manually assembled potentially interesting references to pursue.

Quote from: ersi on 2014-03-06, 08:17:52

Then there are web search combines or combiners. The one I used to know and worked with when on Windows was Copernic. Basically how it worked was that it included a bunch of search engines (like browsers include these days) and it performed searches in all of them at the same time and then listed the results in a pleasant colourful list. Fantastic.

Re: Ways to search stuff on the net

Reply #3 – 2014-03-06, 10:38:02

Quote from: ersi on 2014-03-06, 08:17:52

There's a problem of principle with internet search engine. They don't show the real web. They show the database accumulated by their own web crawlers. Where the crawler has not been, that you don't see.

Known issue.
http://en.wikipedia.org/wiki/Deep_Web etc.

Quote from: ersi on 2014-03-06, 08:17:52

How do you do? Do you look at results in multiple search engines to give yourself the sense of doing proper research or are you the "I'm feeling Lucky!" type?

Neither.
I use search - then advanced search if needed, trying to pick the right keywords (acknowledging that the presumed sites must contain one of/ those exact words to yield).

Re: Ways to search stuff on the net

Reply #4 – 2014-03-06, 12:14:40

Quote from: ersi on 2014-03-06, 08:17:52

Then there are web search combines or combiners.

Now I remembered the term I have actually seen earlier: Search aggregators. Copernic was perfect such.

Copernic also has/had other products:

- Summarizer to shorten texts. There was some supposedly linguistic algorithm doing this. Worked best for English of course, but there were more available languages: French, German, Spanish, ...
- Desktop Search to collect and search in emails, contact lists, files on hard disk and internet either aggregated or specified in various ways. This is the niche where Google soon attacked with its own short-lived Desktop Search thingie.

Interesting what ideas software companies got to deepen search madness in people.

By the way, Manjaro Linux currently comes with a desktop files and programs search thing called Synapse on board. Otherwise I like the defaults in Manjaro a lot, but Synapse seems too heavy and unnecessary to me. I replace it with Gmrun to have a search box for programs rather than files. My personal files are orderly enough.

Re: Ways to search stuff on the net

Reply #5 – 2014-03-06, 20:45:27

Quote from: kardon on 2014-03-06, 20:11:24

Google still has the problem of indexing sites that are closed to most people, but allow the Google crawler bot access.

Problem or feature? You might be able to use the cached page to your advantage.

Re: Ways to search stuff on the net

Reply #6 – 2014-03-06, 21:04:05

Quote from: kardon on 2014-03-06, 20:11:24

Isn't Gopher comparable to searching one or a few sites using a mechanism internal to them? Like searching a forum without Google.

Actually, you are right. But there are two important nuances:

1. The "forum" in this case is the entire network. (Well, not the entire protocol, but one deliberately built network, which is normally separated from other networks that make use of the same protocol. You need to log in to the network.)
2. It works!

I have seen Gopher at work as library catalogues in Finland and Estonia since early 90's. Library catalogues have to be searchable. The search feature is built into the protocol. The protocol is used to store the catalogue and the protocol itself ensures searchability of the catalogue. There is no need to rely on some other provider or additional service/product to crawl the catalogue/network and display the search results.

In case of HTTP, the search engine is an autonomously built service - one server (or bunch of servers) crawling the web to register other servers. People think when they do web searches that they see search results from the web, but actually they see results from the crawler who registers whatever it registers, misses whatever it misses, and displays the results according to its own priorities.

In Gopher, the results are precise and unmistakable. If you can't find something in Gopher, it's because it's not there or you are mistyping. There's no third option. For library catalogues this makes perfect sense.

I checked Wikipedia right now and there it says that Opera never supported Gopher protocol. Sure enough, Gopher is not there anymore in the settings in v.12 but I remember doing Gopher with Opera 6. 15 years ago Gopher support was pretty normal for web browsers.

Re: Ways to search stuff on the net

Reply #7 – 2015-08-13, 16:16:00

An interesting web-search tool I found: Surfraw.

The project was launched by Julian Assange

probably about a decade ago.

It's basically a script for search engines that connect to internet and pull and format the results. The results must be piped into a normal webbrowser.

The odd terminology and style of the documentation must be due to the particular kind of geekhood of Surfraw's creator. Some samples.

Quote from: Surfaw, http://surfraw.alioth.debian.org/

Surfraw provides a fast unix command line interface to a variety of popular WWW search engines and other artifacts of power. It reclaims google, altavista, babelfish, dejanews, freshmeat, research index, slashdot and many others from the false-prophet, pox-infested heathen lands of html-forms, placing these wonders where they belong, deep in unix heartland, as god loving extensions to the shell.

[...]

Global options are common to all Surfraw elvi (clients). You can get a list of the currently installed elvi by typing surfraw -elvi.

All elvi have useful low calorie help, for example:

Code: [Select]
$ sr rhyme -help
[...]

Surfrawize the soul of your favourite internet wonder. Join the Shell Users' Revolutionary Front Against the WWW by submitting code. Reclaim heathen lands. Bear witness to the truth. Its love will set you free.

Elvi (curiously construed as plural, while singular is supposed to be elvis) mean the included search engines. The search occurs by typing in command line surfraw, then the name of the elvis plus the search terms. Surfraw has inbuilt alias for sr, so it can be called by typing sr too.

It also includes a global config file where a graphical and a command-line browser are already defined. In the version that I installed, the browsers were luakit and elinks respectively.

I tried it and it works. The advantages are not too significant though. I like the idea that Surfraw separates the search engines from the webbrowser, but I would like it better if the search engines would still be easily maintainable and configurable - like in a (good) webbrowser.

I mean, I want liberal renaming of the elvi, liberal tweaking/updating of the search url's and terms, and ease and comfort in such work of maintenance. Basically, I would like a sane frontend to the thing.

Here's a tutorial how to create your own elvi. Enjoy!