Google

Ever wonder how Google spell check, related topic suggestions, or ranking works? I do.

My theory (yes theory, I doubt anyone but the two founders truly know the secret to how Google works) is that Google collects information on how user’s behave, the clicks on a link, the amount of sites pointing to a link, and etc. It uses this information to statistically guess at what the user truly wants based on data that seem to correlate with the user’s behavior.

In the case of a misspelling, instead of doing a performing some sort of Levenshtein-word-distance type check to find the best candidate for the word, Google can simply collect the data on what the user typed after his typo, and suggest that word that has been frequently typed in response to the typo.

Topic suggestions probably work the same way as misspellings, they’ll look at behavioral similarities to suggest content that you might like.

My point is that in this process, Google probably doesn’t need have a slightest idea of what’s in the content that it’s displaying on its pages.

My belief is that initially Google was probably did content parsing to figure out how to sort the content to seed their database, and after their database was well seeded, they collected user behavior and used that to rank relevance.

I came across an article that seems to back this theory of mine, so I decided to post about it today:

http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

I completely disagree with the author on how we can throw the scientific method out of the window now since we have so much data, but I did appreciate the possible insight on how Google does things.

Now hosted with DownTownHost

I just moved my server to DownTownHost. What a WORLD of difference. BlueHost = 8 cpus with an average serverload of 100. DownTownHost 8 cpus with an average serverload of .07.

Also, I applied for the 4.95 per month plan and it’s offering all the features I want and need. After the 25% discount using the code below it’s 3.71 per month. That’s even better than GoDaddy. For now, I am very happy with the service.

Anyways, if you want to check out the host yourself use this link: DownTownHost

Oh btw, use the code “happy2008” to qualify for a 25% discount.

BlueHost = LackOfDecentHostBluesHost

So I’ve been with BlueHost for a couple of 2-3 months now. I can say with certainty that as soon as I find a decent host, I’m moving again. I’ve monitored BlueHost for 10 days from June 2 – June 12 and the logs show that BlueHost is pretty much overloaded all the time. When I called their tech support, they gave me some lame excuse like “It’s not unusual for servers to be overloaded during peak hours for 5-10 minutes at a time in a shared hosting environment”. Since the server load coincidentally went down by the time I was done waiting for the representative to pick up, I had no choice but to wait for it to go back up before I called. Unfortunately, the minute I hung up, the server load spiked again. This is when I decided to log the server loads. The log shows that server is overloaded per on an average of 50-90% of the time. I think that is simply unacceptable. If my web page takes FOREVER to load, I no longer consider it hosted. I think it’s okay if they’re overselling, as long as they keep the load under control. I’ve heard HostGator does a decent job of this, but that doesn’t mean I’m going to switch to HostGator. I want to find an even better host. Anyways, you can take my word for it, or you can click the links to my logs of BlueHost server loads (They’re in the format of: <time>, <server load>):

June 2
June 3
June 4
June 5
June 6
June 7
June 8
June 9
June 10
June 11-12

Bluehost Review

This entry is about my experience with BlueHost so far, after 2 months. I can’t say I’m particularly happy with their service. I’ve experienced high server loads at 3 am in the morning, due to mysql database backup. I’ve experienced high server loads from the hours of 10 am to 3 pm due to peak usage. I’m probably going to experience high server load as well during the evening. So my question is, when can I expect there to be a normal server load? When nobody surfs the web? What’s the point of having webhosting if that’s the case?

If you don’t know what a serverload is, it’s a number that roughly represents how many CPUs the load is taking. Your server performs best when that number is less than the total number of CPUs. I’m pretty envious at my co-worker because his server load at HostGator doesn’t ever seem to exceed 5 and doesn’t seems bogged down, whereas my server load seems to exceed 8 like ALL THE TIME and is constantly laggy.

I’m writing a script that tracks the server load and displays the information graphically. I’m going to use that information to try to get BlueHost to move me to a better server. If after all that, they still don’t do anything about my server load, I’m probably going to move on to a new hosting company.

Cross-browser Compatibility

I was working on a project that required cross-browser compatibility, this generally means at least FireFox and IE. The reason is due to the fact that IE is still the most commonly used browser (IE7, IE6, etc.), FireFox coming in second, and then the other browsers split up the rest.

I was trying to code the following structure:

<div>

<div/><div/>

</div>

<div>

<div/><div/>

</div>

Except the two inner most divs were floated left, followed by a break which cleared the float. The code rendered perfect in FireFox and IE7, but it didn’t render correctly in IE 6. So I wracked my head on it for a bit, looked up various reasons on why IE 6 might render the code differently, and eventually found a solution. The solution is to make the div’s position relative. This solution is completely counter-intuitive, and frankly, doesn’t make much sense. So the moral of the story is, sometimes the solution for things can be very dumb, but regardless, it’s the solution.

Dev and Live Environment

Today’s topic will be the importance of having two environments, one to develop your code in, and one to release into the public. Sometimes it is simply easier to edit, run, and test it in the live environment, but when your code deals with data, this becomes more problematic. Imagine some code that insert data into the database whenever you run it, if you run it in the dev environment, it’s really no big deal, but if you run it in the live environment, it might cause database pollution. Although your code should account for that case anyways, but it’s hard to say that during your “updates” you wouldn’t accident code it so that it starts inserting a bunch of meaningless data into your database. Having two environments will allow you to code and test in whichever way you want without having to worry about the consequences.

Migration from GoDaddy to BlueHost

I have just finished moving my site from GoDaddy to BlueHost. This form of migration was the first one I’ve ever done, and it went quite smoothly.

I think a few tips that would help anyone migrate from one host to another would be to first figure out if you’re transferring the domain, web hosting, or both.

Domain hosting is simply the reservation of the domain name, such as google.com, yahoo.com, jacksonleung.com, etc., much like an address, or a telephone number.

Web hosting actually contains all the files and databases behind the domain name, much like the company an address points to, or the customer service representatives behind a telephone number.

If you’re simply changing the hosting, like I did in my case, not only will you have to migrate the database and the files, you’ll most likely have to change the namespace of your domains to the new namespace server of the web hosting provider.

Afterwards you have to make sure all the data from your databases were copied correctly from one server to another then you have to make sure all your script work with the new database environment. You might also want to move your emails from your old web hosting provider to the new one.

Although it might be unnecessary, I like to run through my scripts one last time just to make sure everything works, and after all that, you can cancel the domain / web hosting with the previous provider.

The Big Picture

I generally prefer to pseudo code my code prior to actually starting to write it, it keeps me establish what the big picture is, without having to worry about the nitty-gritty details. Keeping the big picture in mind is often useful to coding any sort of code, from very simple to super complex code.

When you’re adding a feature or coding something which requires the co-operation from various different sources, it best to keep the macro view in mind. In the macro view, you can keep track of what’s supposed to happen, and how the various components behave. This is especially important to keep in mind when you code, due to the fact that we are often working on only one segment of the code at a time, and sometimes we get so lost in the minor details of the code that we forget the code’s function in the overall scheme of things.