Create a Twitter Archive – A PDF dump of all tweets from an account

I have 51 tweets in my @SubhashBrowser account. So, this command will create the Twitter dump.

bash twitter-to-pdf.txt SubhashBrowser 51

Subhash TweetsToRSS must be running when you execute this command. You also need wkhtmltopdf and pdftk installed. The content of twitter-to-pdf.txt is as follows:

# Initialize variables
iPage=1
iMax=1
iRemainder=0
sDocs=""

let "iMax = $2 / 25"
let "iRemainder = $2 % 25"
if [ $iRemainder -gt 0 ]; then
  let "iMax = iMax + 1"
fi

sJavaScript = 
	"(function(){ " + 
	"  var arIframes=document.getElementsByTagName('iframe');" +
	"  var n = arIframes.length; " + 
	"  for (var i=0;i<n;i++) { " + 
	"  arIframes[0].parentElement.removeChild(arIframes[0]); " +
	"}})()" # Javascript to remove YouTube iframes generated by TweetsToRSS

# Create archives
for (( iPage=1; iPage<=iMax; iPage++ ))
do
  echo "Converting http://localhost:8080/?q=%40$1&output=html&older-than=$iPage"
  wkhtmltopdf --quiet \
              --encoding utf-8 \
              --debug-javascript --javascript-delay 60 --run-script "$sJavaScript" 	\
              --user-style-sheet wkhtml_twitter_style.css \
              --image-dpi 300 -s A3 \
              --margin-top 0 --margin-right 0 --margin-bottom 0 --margin-left 0 \
              "http://localhost:8080/?q=%40$1&output=html&older-than=$iPage" \
              "@$1-archive-$iPage.pdf"
  sDocs="$sDocs @$1-archive-$iPage.pdf"	 					  
done

# Combine archives
pdftk $sDocs cat output Complete-Twitter-Archive-of-@$1.pdf

# Updated metadata
echo "InfoKey: Creator" > $1-meta.txt
echo "InfoValue: Subhash TweetsToRSS (www.vsubhash.com)" >> $1-meta.txt
echo "InfoKey: Title" >> $1-meta.txt
echo "InfoValue: The complete Twitter archive of @$1" >> $1-meta.txt
echo "InfoKey: Subject" >> $1-meta.txt
echo "InfoValue: Collection of all tweets from @$1" >> $1-meta.txt
echo "InfoKey: Author" >> $1-meta.txt
echo "InfoValue: V. Subhash" >> $1-meta.txt
echo "InfoKey: Keywords" >> $1-meta.txt
echo "InfoValue: $1, twitter, tweets, archive" >> $1-meta.txt
pdftk Complete-Twitter-Archive-of-@$1.pdf \
      update_info $1-meta.txt \
      output The-Complete-Twitter-Archive-of-@$1.pdf

# Cleanup
mkdir $1-backup
if [ -d $1-backup  ]; then
	mv @$1-archive-*.pdf ./$1-backup
	mv Complete-Twitter-Archive-of-@$1.pdf ./$1-backup
	mv $1-meta.txt ./$1-backup
fi
All 51 tweets from my @SubhashBrowser has been dumped into a PDF archive.

All 51 tweets from my @SubhashBrowser has been dumped into a PDF archive.

Twitter seems to have placed a limit of just above 3000 to the number of tweets you can pull from an account. Here is the complete Twitter archive of US President Donald J Trump (@realDonaldTrump).

https://www.scribd.com/document/337137738/The-Complete-Twitter-Archive-of-Donald-J-Trump-RealDonaldTrump-2016-2017

Advertisements
This entry was posted in software and tagged , , . Bookmark the permalink.

Make a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s