Linux Fu: Roll with the Checksums

We are often struck by how often we spend time trying to optimize something when we would be better off just picking a better algorithm. There is the old story about the mathematician Gauss who, when in school, was given busy work to add the integers from 1 to 100. While the other students laboriously added each number, Gauss realized that 100+1 is 101 and 99 + 2 is also 101. Guess what 98 + 3 is? Of course, 101. So you can easily find that there are 50 pairs that add up to 101 and know the answer is 5,050. No matter how fast you can add, you aren’t likely to beat someone who knows that algorithm. So here’s a question: You have a large body of text and you want to search for it. What’s the best way?



Of course, that’s a loaded question. Best can mean many things and will depend on the kind of data you are dealing with and even the type of machine you are using. If you are just looking for a string, you could, of course, do the brute force algorithm. Let’s say we are looking for the word “convict” in the text of War and Peace:


Start with the first letter of War and Peace
If the current letter isn’t the same as the current letter of “convict”  then move to the next letter, reset the current letter in “convict” and go back to step 2 until there are no more letters.
If the current letters are the same, move to the next letter of convict and, without forgetting the current letter of the text, compare it to the next letter. If it is the same, keep repeating this step until there are no more letters in “convict” (at which point you have a match). If not the same, reset the cu ..

Support the originator by clicking the read the rest link below.