Tagged: code

How to make your website serve pages faster?

I made this list for CNN but it might be useful for any website owner.

CNN source code message

HTML source code at CNN.com has a hidden message. CNN Labs team asks for more ways to speed up their website.

I am always on the lookout for junk-serving domain names that I can add to my OS hosts file and CNN provided a number of them. Browsing through their source code, I found a message asking for ideas to improve site speeds. The CNN Labs team has already implemented several speed improvement techniques such as CDN, DNS prefetching, aysnc loads, and code minifying, and were interested in anything developers curious enough to look into their code could provide. Here are a few that I came up with:

  • Load HTML first. Everything else should be loaded afterwards. When CSS and JS are loaded along with HTML, they will block the display of the page content for at least one or two seconds. To do it properly, pre-press static HTML pages from your CMS and serve those static HTML pages to your visitors. Don’t put your HTML content in a database and serve them using a server-side script every time some one requests an article. Static HTML pages are served without any server-side processing latency. Like an image file, static pages are copied to the browser without any processing. They will not be parsed and interpreted like a PHP or ASPX script. (Pre-compiled scripts may be faster but not as fast as a simple file copy to an output response stream.) Use server-side scripts for what they were originally meant for – really dynamic stuff such as Ajax requests, database searches and changes, handling live data and processing user input.
  • Lazy-load Javascript files, CSS stylsheets, images and videos using Javascript code in the static HTML pages. (My unreleased CMS does these two things already – check out www.vsubhash.com. While this CMS is targeted at people with personal websites, these two ideas can reduce waiting times for any kind of website.)
  • Don’t use custom fonts that get automatically downloaded from your website or even the Google fonts repository. (CNN uses its own “CNN” font.) Use a universal stylesheet instead. Browsers can render web pages faster if they can use fonts already installed on the device. Does that make your webpages appear differently on different devices? Sweet mother of God! How will you survive? Really? No, sir, we don’t think much about your corporate/branding font. It barely registers on our minds. You are serving English text to an English audience. You don’t need a custom font. Custom fonts are meant for languages that don’t have universal device support or if the text is in a non-Unicode non-standard encoding tied to some legacy font. The need to display bar codes on a shopping site is a good use-case for a custom font. Emojis are not. Your EOT font download is likely to get stuck with all the other junk your page is downloading simultaneously. The visitor sees blanks everywhere in place of text, wondering what has gone wrong. Is that good for your corporate image? Don’t shock the user. Please learn about the Principle of Least Surprise (PLS).
  • Don’t think people with enough bandwidth will have no problem. Whether you are on a PC or a mobile device (good Linux computers exempted), there are hundreds of processes that are forever talking to base and downloading/uploading files and data.  Your webpage is competing with that and other tabs already opened in the browser. (Google’s policy states that Android will be talking to base, even if no app is running, even if it is in sleep mode, even if you have turned off wireless… So, imagine the situation when the device is being actively used.)
  • Don’t use images when the same effect can be achieved by CSS. But don’t use a CSS or Javascript framework when a simple image would do.
  • Don’t use a third-party CMS or big Javascript frameworks for your high-traffic website. This kind of software suffer from code bloat as CMS developers try to cram more features or try to make certain features work in all situations. Write your own CMS with custom Javascript and CSS optimized for your website. Maximum control & efficiency should be the goal. (CNN web pages, typical of off-the-shef CMS-driven websites, contain a ton of JS and CSS and very little HTML.) Another problem with popular CMS programs are that site admins well-versed with them are not easy to find. Even if you do find them, it may be difficult for them to determine what a piece of code is doing. You eventually end up hiring developers who can add extra or customized functionality over the CMS-generated Javascript and CSS. Good developers will write their own code but many will resort to using more third-party CMS plugins. Over time, the code bloat will get beyond control. Very few developers in the Web team will be familiar with the site’s code and be able to troubleshoot effectively.
  • When you use ad networks, it is inevitable that you lose some control. However, you:
    • can load their script asynchronously (as CNN has already done).
    • ensure that their code is within limits. Many ad networks download a ton of (the same) Javascript libraries to do the simplest of things.
    • do not use more than one ad network in one page serving. You can rotate different providers on different browser requests. If you hit so many third-party sites for one request, as CNN does, what efficiency will you achieve?
  • Don’t use autoplay videos, even if you are a TV news channel. Are you misrepresenting video plays to advertisers? What? Top tech companies do it as well? Well, it is not a matter of just ethics but money too. Be assured that autoplay videos are a waste of bandwidth when the visitor just wants to read your article or is already listening to music. Even with the inflated statistics, you will still lose money… over time.
  • Why use a third-party service to serve related articles from your own site? Ensure that your articles are tagged and categorized appropriately and serve their links on the sidebar, ordered by date and/or popularity. Ensure that this “related” stuff is not part of the static HTML churned out by your CMS. Lazy-load related content as well.
  • Limit the number of redirects in the links you post online. CNN links on Twitter pass through a maze of domains before settling down. These hops are time-consuming because they are on a https connection. Here is a list of redirects for one of their links (https://cnn.it/2DtZSme).
    • t.co (Twitter) to cnn.it
    • cnn.it to bit.ly
    • bit.ly to trib.al
    • trib.al to www․cnn.com
    • www․cnn.com to edition.cnn.com
  • Is your website accessibility-friendly? When you create an accessible website, you create a fast website by default. If you see more CSS+JS than HTML in a ‘View Source’, then your site is not accessible. It is not search engine-friendly either.
  • Don’t use pages that automatically reload. Are you a moron? (Not you, sir/madam, but I can list a dozen websites, a prominent news aggregation site among them, that do this without shame and this question is rhetorically addressed to them.) Use Ajax unobtrusively to change only the updated content.
  • Understand the principle behind Ajax. You display a page first and then use Ajax to inject data into it without reloading. Gmail does this in reverse. It will make you wait while it loads several megabytes of XML data, containing all your emails, before it displays your inbox – totally defeating the idea behind XML requests. Load what is not needed, not everything from the beginning of time.
  • Don’t use social media plugins as-is-where-is. They usually block the loading of the page. Study their no-script versions and use your Javascript to generate valid links dynamically.
  • Analytics code increases bloat , reduce the responsiveness of the site and increases bandwidth usage. Do you really need them? There are many free server software that can process server log files and generate insightful visitor stats. Check your CMS or ask your hosting provider. If you need to study click-intensive regions for improving site effectiveness, use them on a temporary basis and get rid of it once you have generated enough data.
  • Don’t think that as Javascript engines are so fast, browsers will have no problem with all the gunk you have included in your webpages. Nah-hah! Javascript engines may claim to be fast but HTML and CSS rendering is still slow. Don’t believe me? Check out my Javascript benchmark test. After that, add network latencies (over which you have no control) to the mix and judge for yourself.
  • Over time, your pages will become overweight and be lethargic. Either give your site a redesign or start afresh. If not, do regular weight trimming. Measure the dry and wet weight of a web page. Is it really worth it? Regular code reviews help identify parts that are no longer required and provide ideas to optimize CSS and JS.

How to get the silly Twitter verified status without any real verification

With this Greasemonkey script it is easy.

// ==UserScript==
// @name        Twitter Verified Status For Everyone
// @namespace   com.vsubhash.js.twitter-verified-status-for-everyone
// @include     https://twitter.com/*
// @version     1
// @grant       none
// ==/UserScript==

var oEls;
var sVerifiedHtml1 = "<span>‏</span><span class=\"UserBadges\"><span class=\"Icon Icon--verified\"><span class=\"u-hiddenVisually\">Verified account</span></span></span><span class=\"UserNameBreak\">&nbsp;</span></span>";
var sVerifiedHtml2 = "<span class=\"ProfileHeaderCard-badges\"><a href=\"/help/verified\" class=\"js-tooltip\" target=\"_blank\" title=\"Verified account\" data-placement=\"right\" rel=\"noopener\"><span class=\"Icon Icon--verified\"><span class=\"u-hiddenVisually\">Verified account</span></span></a></span>";


function addTwitterVerifiedStatus() {
	console.log("TVSFE: Changing HTML");
	oEls = document.getElementsByTagName("span");
	if (oEls.length > 0) {
		for (var i = 0; i < oEls.length; i++) {
			if (oEls[i].className) {
				if (oEls[i].className.indexOf("FullNameGroup") > -1) {
					if (oEls[i].innerHTML.indexOf("Icon Icon--verified") == -1) {
						oEls[i].innerHTML += sVerifiedHtml1;
						console.log("TVSFE: Screenname icon added");
					}
				}
			}
		}		
	}
	oEls = document.getElementsByClassName("ProfileHeaderCard-name");
	if (oEls.length > 0) {
		for (var i = 0; i < oEls.length; i++) {
			if (oEls[i].innerHTML.indexOf("ProfileHeaderCard-badges") == -1) {
				oEls[i].innerHTML += sVerifiedHtml2;
				console.log("TVSFE: Profile icon added");
			}
		}
	}	
}


function handle_DOMLoaded(aoEvent) {
	try {
		console.log("TVSFE: Page loaded");
		window.setTimeout(addTwitterVerifiedStatus, 3*1000);
		
		addTwitterVerifiedStatus();
	} catch (e) {
		console.error("TVSFE: Error - " + e);
	}
}


document.addEventListener("DOMContentLoaded", handle_DOMLoaded, false);

How to create a Twitter archive of your tweets – Use WkhtmlToPDF and TweetsToRSS to save your tweets as PDF

com.vsubhash.bash.twitter-to-pdf.txt” is a Unix/Linux shell script for archiving all messages posted by a Twitter account in PDF format.

The command to use is:

bash com.vsubhash.bash.twitter-to-pdf.txt \
  realDonaldTrump \
  4000 \
  nojs

The first argument to bash is the shell script. The second argument is the Twitter account. The third argument is the number of tweets to archive. The fourth argument is used if the Javascript used by TweetsToRSS causes problems with WKhtmlToPDF.

This code assumes that WKhtmlToPDF is available in /opt. Else, you need to update the script. WKhtmlToPDF uses konqueror browser under the hood. The good thing about it is that all HTML links are preserved.

Tweets to PDF conversion

Tweets to PDF conversion

Compile from source and install Apache HTTP Server 2.4 and PHP 7 on Linux

A how-to guide to build localhost (“virtual host”) PHP 7 sites on the latest Apache HTTP server

I am using a nearly decade-old version of Ubuntu as my main OS. As it runs Gnome 2 and Firestarter, I will be sticking with it until rapture time. However, I did not want the old Apache and PHP from the “Old Releases” Ubuntu repository. Apache HTTP Server 2.4 was released in March this year. I couldn’t care for the new features. The old Apache would have worked just fine but new was new. So, I decided to install the latest Apache HTTP server and PHP 7 by building them from source. There aren’t any tutorials on building Apache 2.4 and PHP 7 from source, particularly from the requirement of a development machine. Besides that many of the configuration files, executables (a2ensite) and directories seem to be missing after the compilation/installation process was complete. It required a bit of thinking and RTFM. So, I decided to put it all down here in one place.

Apache HTTP Server 2.4 (httpd) compilation, installation and configuration

First, download the source files.

cd ~/Build
wget -c url-of-pcre-archive
wget -c url-of-apache-httpd-archive 
wget -c url-of-apr-util
wget -c url-of-apr

Extract the source files of PCRE and httpd in a directory named Build. The source files of the Apache Runtime need to be extracted inside the srclib directory of the httpd directory.

cd ~/Build
tar -xf pcre*
tar -xjf http*.bz*

cd http*
cd srclib
tar -xf ../../apr-1*
tar -xf ../../apr-util-1*
ln -s apr-1* apr
ln -s apr-util-1* apr-util

Next, compile and install PCRE to some directory (/usr/local/pcre).

cd ~/Build
cd pcre*
./configure --prefix=/usr/local/pcre
make
sudo make install # Installs PCRE to directory /usr/local/pcre

Finally, compile Apache httpd to some directory (/opt/httpd) with specified PCRE and Apache Runtime directories.

cd ~/Build
cd http*
./configure --prefix=/opt/httpd --with-included-apr --with-pcre=/usr/local/pcre
make
sudo make install # Install Apache httpd to directory /opt/httpd

The Apache HTTP Server has been installed. Now, it is time to start it and check it in a browser.

sudo /opt/httpd/bin/apachectl -k start # Start server
firefox http://localhost  # Serves /opt/httpd/htdocs/index.html

The configuration files of Apache httpd 2.4 seem to be different from the ones from the Apache installed from the old Ubuntu repositories. The main configuration file will be at /opt/httpd/conf/httpd.conf

PHP7 compilation, installation and configuration

Next, compile and install PHP 7 to some directory (/opt/php-7). When executing the configure statement, you need to specify the location of the Apache Extension Tool (/opt/httpd/bin/apxs).

sudo /opt/httpd/bin/apachectl -k stop # Stop Apache HTTP Server
cd ~/Build
cd php-7*
./configure --help | more
read -p "Press enter to continue" oNothing 
./configure -with-apxs2=/opt/httpd/bin/apxs  --prefix=/opt/php/php-7/arch-all --exec-prefix=/opt/php/php-7/arch
make
make test # This step is optional 
sudo make install # Install PHP to directory /opt/php/php-7
sudo libtool --finish ~/Build/php-7/libs # Adds PHP 7 module loader to httpd.conf

This completes the PHP 7 installation. Please note that the libtool statement writes the following line to Apache server’s httpd.conf file.

LoadModule php7_module        modules/libphp7.so

After PHP 7 has been installed, add a PHP file extension handler to Apache server’s httpd.conf file.

# The following is a multi-line statement. Copy all of them.
echo '
  <FilesMatch \.php>; 
    SetHandler application/x-httpd-php 
  </FilesMatch>' | sudo tee -a /opt/httpd/conf/httpd.conf 

Copy the development php.ini in the Build directory to /opt/php/php-7/arch/lib directory.

cd ~/Build/php-7*
sudo cp php.ini-development /opt/php/php-7/arch/lib/php.ini

Modify this file appropriately for your developmental setup. For production, you will have to use modify and use php.ini-production as the php.ini file. This completes the PHP installation and configuration.

Now, it is time to test .it. Create a sample PHP file in the Apache server’s htdocs directory and request it in a browser.

echo '<? phpinfo(); ?>' > php-test.php
sudo mv ./php-test.php /opt/httpd/htdocs/php-test.php
sudo /opt/httpd/bin/apachectl -k start # Start Apache HTTP Server
firefox http://localhost/php-test.php

This will prove that your HTTPd server can handle PHP. However, to develop websites on your local development machine, some more changes need to be made. You can delete the /opt/httpd/htdocs/php-test.php file.

Apache HTTP Server reconfiguration for virtual hosts

First, append a line in your /etc/hosts file so that browser requests to your test site (say me-website.com) are routed to your local installation of the httpd server.

echo "127.0.0.1 me-website.com" | sudo tee -a /etc/hosts

Next, clean up httpd.conf by removing the PHP file handler and the PHP loadmodule statement. Instead, append the following lines to httpd.conf.

# Added by me
Include conf/me-websites.conf 

Now, httpd.conf is mostly in its original state, save for the line that tells httpd to load the new me-websites.conf configuration file. Then, create the file me-websites.conf file and put all your virtual hosts configurations in it. Copy this file to /opt/httpd/conf directory. This will ensure that PHP scripting is limited to your “virtual host” sites.

Listen 8080

<VirtualHost *:8080>
  ServerAdmin webmaster@me-website.com
  DocumentRoot "/home/account-name/Websites/PHP/me-website/htdocs"
  ServerName me-website.com
  ServerAlias www.me-website.com
  ErrorLog "/home/account-name/Websites/PHP/me-website/logs/me-website.com.error.log"
  TransferLog "/home/account-name/Websites/PHP/me-website/logs/me-website.com.access.log"
  LoadModule rewrite_module modules/mod_rewrite.so
  LoadModule php7_module        modules/libphp7.so

  <FilesMatch \.php$>
    SetHandler application/x-httpd-php
  </FilesMatch>
  
  <Directory "/home/account-name/Websites/PHP/me-website/htdocs">
    Options Indexes FollowSymLinks
    AllowOverride All
    Require all granted
  </Directory>

</VirtualHost>

Next, add your account to the user group www-data. Your account can be the owner of the directory /home/account-name/Websites/PHP/me-website/htdocs but you need to give read, write and execute permissions to www-data. Now, you can put your PHP scripts in /home/account-name/Websites/PHP/me-website/htdocs and start coding. You can test the site in a browser at http://me-website.com:8080/

If you would like to test it on port 80, then remove the “Listen 80” in httpd.conf and replace 8080 in inside the VirtualHost directive in me-websites.conf. Then, you can simply test your sites at http://me-website.com/ or http://me-website.com:80.

Fix for autogenerate bug in Eclipse PHP run configuration

I have already published how to modify the Eclipse source files to fix the autogenerate bug in the Run configuration of Eclipse PHP. If that is too much for you, then you can simply let Apache HTTP Server’s rewrite module fix it for you. Create a .htaccess file in your home folder. This will remove the your Eclipse PHP project name for the run configuration.

RewriteEngine On
Redirect "/me-website" "/"

Interestingly, the propriety IDE (derived from the same open-source project) sold by the creator of PHP does not suffer from this PDT “bug”.

Create a Twitter Archive – A PDF dump of all tweets from an account

I have 51 tweets in my @SubhashBrowser account. So, this command will create the Twitter dump.

bash twitter-to-pdf.txt SubhashBrowser 51

Subhash TweetsToRSS must be running when you execute this command. You also need wkhtmltopdf and pdftk installed. The content of twitter-to-pdf.txt is as follows:

# Initialize variables
iPage=1
iMax=1
iRemainder=0
sDocs=""

let "iMax = $2 / 25"
let "iRemainder = $2 % 25"
if [ $iRemainder -gt 0 ]; then
  let "iMax = iMax + 1"
fi

sJavaScript = 
	"(function(){ " + 
	"  var arIframes=document.getElementsByTagName('iframe');" +
	"  var n = arIframes.length; " + 
	"  for (var i=0;i<n;i++) { " + 
	"  arIframes[0].parentElement.removeChild(arIframes[0]); " +
	"}})()" # Javascript to remove YouTube iframes generated by TweetsToRSS

# Create archives
for (( iPage=1; iPage<=iMax; iPage++ ))
do
  echo "Converting http://localhost:8080/?q=%40$1&output=html&older-than=$iPage"
  wkhtmltopdf --quiet \
              --encoding utf-8 \
              --debug-javascript --javascript-delay 60 --run-script "$sJavaScript" 	\
              --user-style-sheet wkhtml_twitter_style.css \
              --image-dpi 300 -s A3 \
              --margin-top 0 --margin-right 0 --margin-bottom 0 --margin-left 0 \
              "http://localhost:8080/?q=%40$1&output=html&older-than=$iPage" \
              "@$1-archive-$iPage.pdf"
  sDocs="$sDocs @$1-archive-$iPage.pdf"	 					  
done

# Combine archives
pdftk $sDocs cat output Complete-Twitter-Archive-of-@$1.pdf

# Updated metadata
echo "InfoKey: Creator" > $1-meta.txt
echo "InfoValue: Subhash TweetsToRSS (www.vsubhash.com)" >> $1-meta.txt
echo "InfoKey: Title" >> $1-meta.txt
echo "InfoValue: The complete Twitter archive of @$1" >> $1-meta.txt
echo "InfoKey: Subject" >> $1-meta.txt
echo "InfoValue: Collection of all tweets from @$1" >> $1-meta.txt
echo "InfoKey: Author" >> $1-meta.txt
echo "InfoValue: V. Subhash" >> $1-meta.txt
echo "InfoKey: Keywords" >> $1-meta.txt
echo "InfoValue: $1, twitter, tweets, archive" >> $1-meta.txt
pdftk Complete-Twitter-Archive-of-@$1.pdf \
      update_info $1-meta.txt \
      output The-Complete-Twitter-Archive-of-@$1.pdf

# Cleanup
mkdir $1-backup
if [ -d $1-backup  ]; then
	mv @$1-archive-*.pdf ./$1-backup
	mv Complete-Twitter-Archive-of-@$1.pdf ./$1-backup
	mv $1-meta.txt ./$1-backup
fi
All 51 tweets from my @SubhashBrowser has been dumped into a PDF archive.

All 51 tweets from my @SubhashBrowser has been dumped into a PDF archive.

Twitter seems to have placed a limit of just above 3000 to the number of tweets you can pull from an account. Here is the complete Twitter archive of US President Donald J Trump (@realDonaldTrump).

https://www.scribd.com/document/337137738/The-Complete-Twitter-Archive-of-Donald-J-Trump-RealDonaldTrump-2016-2017

BASH script to check for broken hyperlinks from a list

This BASH scripts several links from a file.

for sLine in $(< url-list.txt); do
  echo "Checking $sLine"
  sStatus=`curl -s -I $sLine | grep -i "HTTP/1.1 "`
  echo -e "\t$sStatus"
  sStatusCode=`echo $sStatus | awk '{ print $2}'`
  if [ "$sStatusCode" -eq "200" ]; then
    echo $sLine >> valid-urls-list.txt
    echo -e "\tValid => \e[33;1m$sLine\e[0m"
  else
    echo $sStatus
    echo -e "\tInvalid => \e[31;1m$sLine\e[0m"
  fi
done

BASH script to check a list of proxy IP addresses

Some websites in North America do allow access to people from the rest of the world. You can access them via publicly available proxy IP addresses in that region. There are many websites that provide lists of proxy IP addresses (http://pastebin.com/search?q=proxies). Some proxies work on a special port. You just need to append the port number to the IP with a colon in between. For example, 46.165.250.235:3128. I usually copy the table to Calc (spreadsheet program) and then data-transform the pasted text to obtain the IP:port combination.

Testing the IPs in the list can be a hassle if you have do it manually using a browser. This BASH script can automate it. Just run it and see which proxy is fast. Press Ctrl+C when you find a good proxy. This script expects the proxy IP addresses to be placed in a file named proxy-list.

sed '/^$/d' proxy-list.txt > proxy-list2.txt

for sLine in $(< proxy-list2.txt); do
  export http_proxy="http://$sLine"
  echo "Testing $sLine"
  wget --spider --timeout=5 --tries=1 http://www.example.com
done
rm proxy-list2.txt
Output of the BASH script to check for working proxy IP addresses.

Output of the BASH script to check for working proxy IP addresses.

From the output of this script, you need to pick up those IPs that return “200 OK” status and put in a list of good working proxies. Then, create a new Firefox profile (firefox -ProfileManager) and changes its proxy to one from the working proxies list.

After that, whenever you need to access a site that is not available directly from your ISP, use the new Firefox profile that you created (firefox -P proxy-profile-name). Proxies tend to fail after some time but you can just use another from the list.

How to kill an application or process in Linux using BASH

The pkill command kills a named application but it does not work all the time. It can even when used with sudo. The good old kill command is the ultimate solution. Hence, I have use this alias in my .bashrc file in the home directory.

alias kll='bash ~/Scripts/kll.txt'

In the script file, I have this code:

for sID in `pgrep $1`
do
  ps $sID
  kill -STOP $sID
done

for sID in `pgrep $1`
do
  ps $sID
  kill -KILL $sID
done

To kill an app, I type kll PROCESS-NAME. If it does not work, I would sudo it.

Install Caja Actions for Mate Desktop, the fork of Nautilus Actions Configuration

Mentioned in my March 2016 article in Open Source For You (OSFY) magazine

In Mate desktop, the file manager is not Nautilus but caja. And, Nautilus Actions Configuration (created by Roberto Majadas) does not work with caja. Wolfgang Ulbrich has ported Nautilus Actions for Mate/caja. It is called Caja Actions and is available at https://github.com/raveit65/caja-actions

None of the steps given online worked. It also was not complete. There is a DEB package linked on that GitHub page but it is not a DEB file. I get a .man file instead. If you prefer a DEB, then you should install checkinstall and substitute the sudo make install command with sudo checkinstall –install=no to create a DEB package. Don’t forget to change the name to “caja-actions” and set the correct version number.

sudo apt-get install checkinstall xclip xsel yelp-tools libglib2.0-dev devscripts build-essential libgtk2.0-dev libunique-dev libgtop2-dev libgtop2-7 libxml2-dev uuid-dev libcaja-extension libcaja-extension-dev docbook-utils
mkdir build-caja-actions
cd build-caja-actions
git clone git://github.com/raveit65/caja-actions
cd caja-actions/
 
NOCONFIGURE=1 ./autogen.sh
./configure --disable-schemas-install --with-gtk=2 --enable-mateconf=yes --disable-scrollkeeper --enable-html-manuals --with-default-io-provider=na-desktop
sudo make
sudo make install