Useful command to test speed of a container, vm, or system

I’ll be breaking down the following command part by part:

time dd if=/dev/zero of=test.dat bs=1024 count=100000

 

What does time do? It runs a process and then captures how long it took to execute.

What about DD? Well, it’s a command that copies data from a standard input to a standard output.

What about the params if, of, bs, and count?

“if”: It’s decently obvious, but “if” specifies the input, in this case we’re taking input from a special file that provides as many null characters as there are read from it; an infinity file of sorts.

“of”: It’s the output file.

“bs”: Byte size

“count”: the number of blocks

So all together, the command writes 100,000 blocks of 1,024 bytes of binary zeroes into the file of “test.dat”. In other words, generates a 100 MB file. This command allows you to generate a 100 MB file and test the  IO performance of a system. As we move towards a world we’re optimizing the crap out of everything, this is a very useful command to know.

Amazon S3 Outage

Today’s post is regarding https://techcrunch.com/2017/02/28/amazon-aws-s3-outage-is-breaking-things-for-a-lot-of-websites-and-apps/

These type of occurrences are becoming more and more common. Tons of company has placed a ton of faith into the Amazon ecosystem, and time and time again, it looks like Amazon has let them down. When these things broke, it broke at a MASSIVE scale (AWS outage knocks Amazon, Netflix, Tinder and IMDb in MEGA data collapse, https://www.theregister.co.uk/2015/09/20/aws_database_outage/ )

http://research.omicsgroup.org/index.php/Amazon_Web_Services

There were other outages in 2012, 2013, and probably more unlisted. I think it’s an interesting challenge that Amazon is tackling, and I feel like more and more of the web is putting all of their eggs into one giant basket.

I wonder, if we were to build a truly scalable, and unlikely to be impacted system, maybe it might make sense to diversify the system’s infrastructure to utilize multiple services. Maybe some redundancy at the DNS layer, then some more at the LB, some more at how things are replicated, localized and so on… Just something to reflect on due today’s outage, “How can I prevent my organization from being impacted by this?”

How I cleaned more than 8,000 thousand emails from my mail box

Long long time ago, briefly after the birth of gmail, I created an email account, and mail was good. Fast forward to now, holy spams. Years and more than a decade of neglect, I’ve managed to amass more than 11,000 emails, and this is post spam filter. I guess over the years, I must’ve signed up for every single notification and newsletter out there. Each them I delete an email, and unsubscribe from a list, another newsletter shows up, and I’d think that I must’ve unsubscribed already, but I’m not too sure anymore. All I knew was that my inbox was looking like this:

I’d stare at that number every day, thinking “Someday, I’ll clean it, but today is not to the day…”

The idea of going through my mail one by one, and then checking to see if the sender was a bulk sender or not, and then unsubscribing from it, just seems like such a time consuming task. Then I start noticing that in the midst of the spam, here and there, there were some important emails I’m starting to miss. That was the spark that lit my fire to put an end to this spam once and for all.

Using my computer programming powers, I created a program to go through my mail, and build a list of senders I receive emails from, and the amount of emails I have from them:

thousands upon thousands of emails later

I’ve built a list of emails and their frequencies, and life was good, but I knew I can do better.

I took it a step further, and built another list based on the domain of the sender.

Utilizing these two newly crafted weapons in my arsenal, I blew away thousands upon thousands of emails, some of which were spam, some of which were transaction emails that no longer have any importance. Once the non-important emails have been unsubscribed from and removed, it was so much easier to deal and organize my old emails. Once that noise was removed, it was so much easier to deal with my new emails. Now, my emails look like this

And life… was good.

PHP Framework Plugin Evaluation

Which one is the BEST framework? Well! There are many ways to benchmarking a framework, speed, adoption, usability and so on. Today, I want to examine the plugin community for these frameworks.

I’ve pulled a list from http://hybridauth.sourceforge.net/plugins.html and https://github.com/opauth/opauth/wiki and I plan to review frameworks that are on both URLs, the reason being, is that I don’t believe it makes sense to code authentication systems anymore. It’s been done a trillion times before, why are we reinventing the wheel? If the framework isn’t listed on these two URLs, I’ll prematurely conclude that the community isn’t active enough to put them on the map.

The frameworks that show up on both URLs is as follows:

  • CakePHP
  • CodeIgniter
  • Laravel
  • Symfony
  • Yii
  • Zend

Here are the URLs I’m using to compare the plugin / extension libraries of each framework:

Not meant to draw any real conclusions, but it does give an idea of how active the community is, and sheer amount pre-coded stuff out there. I’ve basically went through each site, and scrapped the urls, the followed the urls and parsed the resulting HTML for the date which the extension was updated. I haven’t prod any further than that, although at this point, I am hopeful that if I was on either the Symfony or the Laravel platform, I can look forward to a lot of pre-written code.

Hybris vs Magento

“We’re on Magento, but we need to upgrade to Hybris!”

“Nothing is true, everything is permitted

I went to magento.com and hybris.com and I took a look at two companies, and then did a benchmark on the two companies. Which of the following do you think is the “better” version?

bloom Oakley

 

The slower loading one is actually Hybris. The faster one is Magento. People are often quick to dismiss languages, technologies, and softwares. I say nay! Try to figure out things first before you throw all those “extra screws”. It’s important to do a cost-benefit analysis on MANY fronts.

Don’t buy into hype. Too much of this world is built upon inefficiencies. Do understand that often times, interests conflict, what is in your best interest isn’t in their best interest.

Hybris is built in JAVA, JAVA has many pros, but one of the cons is that developers are hard to find, and it’s not exactly the fastest to code on either. Magento is built on PHP, many cons, but one of the pros is PHP developers are plentiful and projects can be built quickly and often times, very cost-effective.

Just understand that the more complex and inaccessible your environment, the harder it is to scale it. You’ll run into issues into many forms of scaling issues, whether it’s code, load, or human-capital. Switch to a solution only after carefully assessing the pros and cons of it, this choice MUST be made extremely carefully because the impact of this decision is extremely far reaching. Also understand that simply because certain things are “best practices” doesn’t necessarily mean that it’s the “best practice” for your company and situation.

Only one way to do things? The cat would disagree

I had a discussion with an industry peer today, regarding databases. Two conclusions he arrived at, which are right, but also wrong. One, “strings have no business being in a SQL statement”, two, “IDs have no basis being in a mapping table”. From a peer data storage and efficiency perspective, you’re correct, but from a practically perspective, you’re wrong. The statement about IDs being in a mapping table, from a peer database perspective, you’re correct, and from a real-world perspective, you’re wrong.

Strings have no business being in a SQL statement

The point of readability is to provide the ability to deduce, at a glance, as much information as reasonably possible. So lets say we have the following database table structure:

graph

 

How would you query all the articles of a section? My response is:

SELECT * FROM Section S
JOIN SectionArticleMap SAM ON S.idSection = SAM.Section_idSelection
JOIN Article A ON A.idArticle = SAM.Artcile_idArticle
WHERE Section.Title = ‘Name’

The only response he thinks is acceptable is:

SELECT * FROM SectionArticleMap SAM
WHERE Section_idSection = 1
AND Article_idArticle = 1

He claimed a string has no place being in a SQL statement, he believes there’s only one correct way, and I’m sorry, but he’s wrong. He favors IDs because it’s immutable, and he believes they will remain longer, which is true, but if you look at categories, they’re represented in names, and not IDs. In a sea of SQL statements, I would have to do a lot of grunt work to figure out exactly which section the statement is tied to, if I wanted to re-use, I’d have to figure how which ID to replace it with. The prior allows me to easily figure out the section and re-use the query. The section is called “Name”, and if I need to re-use the statement for another section, I simply change the name.

I’m not saying the prior is THE CORRECT way of doing things, nor am I claiming the later is the INCORRECT way of doing things. What I’m claiming is the strong statement that ‘such things have no business being in a SQL query’ is wrong. The prior is clearly easier to understand than the later. I know at a glance that I’m fetching articles for a section titled “Name”, the later, I’ll have to do some additional queries, and if the titles aren’t maintained in the DB, but in the code, then some code diving, and if the DB structure somehow became unsynced with the code, then some nightmares are due to follow. There are pros and cons for every approach.

IDs have no basis being in a mapping table

I basically add an ID to all tables now and days for cross-platform compatibility. I informed him during my time as a professional developer, I’ve come across scenarios that merited an ID being in a mapping table, in which he countered, that he’s been working professional for 25 years and there is never a case for an ID column in a mapping table, and anything requiring it is just crap code. It appears that during his time he might not have dealt with the need for many different codebases to interface with the same database, or at the very least, not CakePHP. “By convention the ORM also expects each table to have a primary key with the name of id" (http://book.cakephp.org/3.0/en/orm/table-objects.html)

From a database perspective, it’s very easy to say that the ID as a primary key takes up unnecessary space, and is bad practice, but once you factor CakePHP into the picture, then having an ID IS the best practice.

Is CakePHP crap code? I personally don’t think either CakePHP or any software built on-top of it is crap code, there are always room for improvement, but without understanding the rhyme or reason of why things are the way they are, I’m hesitant to claim things as broken.

I’m not a big fan of people with high-technical responsibility being extremely closed minded. Certain solutions aren’t ideal for one-case, but might be ideal for another, which is why in academia, you’re going to hear a lot of “it depends”. People whose lives involve wisdom and learning, often time know that there’s never a clear-cut answer for everything, and everything depends on other factors, why then, is the world so littered with single solution answers?

Managerial Assessment

It’s that time of the project again, something went wrong, and a goat needs to be sacrificed. As a person who is often found to be in charge of projects, I hold my bosses to the same standards as I hold myself, and underlings. If something goes wrong, the problem goes from the bottom, all the way to the top of the chain.

In a simplistic example, assuming there is a dev team, a team lead, a CTO, and then the CEO. If the project fails apart, and there is a firing decision, the CEO MUST have a team debrief. Every single member involved needs to write in their own opinion, what happened. Sure, a project could’ve failed because someone on the bottom didn’t know what they were doing, but at the same time, isn’t it the team lead’s job to make sure they knew? Then isn’t it the CTO’s job to make sure the team lead’s on task? Isn’t it the CEO’s job to make sure the CTO is capable of such actions?

Fact of the matter, incompetence happens at all levels of a corporation and company. Just because there is a scapegoat doesn’t mean the issue has been taken care of. You have a termite infestation, you’ve killed a termite, but the infestation still exists.

As a CEO, you should gather data on various people’s perspective on what the issue is, and formulate  your own decision. You have to get a perspective of how things are looking down, and then another perspective of how things look like from below. Just like a game of “communication”, if you don’t know what message the very end received, then you don’t know the message was corrupted along the way, in fact, unless you investigate the “nodes”, you won’t even know where and when things got corrupted. Not debriefing is like allowing your ship be sailing through iceberg ridden waters, without checking for icebergs.

Scapegoating will buy bad management time between the current SNAPFU and the next, but if you were to catch the manager in the act of scapegoating, you can prevent yourself from losing some very talented individuals (human capital), at the same time, preventing the bad manager from gaining power. Think about it this way, once the bad manager sets the tone that anyone who disagrees with his horrible management style will get fired, who will correct his actions? A strong IT company needs to be built on allowing talent, innovation, and best practices to flourish. Allow bad managerial nodes will create a chilling effect, which will ultimate hamper your IT team, and ultimately your business.

As a CEO, debriefing, exit interviews, and what not, are the least you can do. As a board member or investor, I’d expect them to do at least this much. Even the highest level, there is such an assessment, so why wouldn’t you think that as a CEO, you can afford to simply take management’s word for things? Even auditors are brought into the picture from time to time. To improve, you must assess, progress without assessment is most likely just bull excrement.

www subdomain or no www subdomain

This topic is a very old an ancient topic, but I’ve arrive definitively at whether or not the main domain should have www, or not. The answer is “it should”.

The reason being, is that a cookie set at the domain level, exists for all subdomains. If you have subdomains, or ever plan to have subdomains in the future, it’s best to use “www” subdomain for your main site. It’ll pay off by saving you some headaches down the line when you have specialized subdomains in the future (blog, beta, members, etc.)