Synecdoche

Posts

Setting up virtual environment for Python

Many specialized tools are written for some version of Python like python2.7 and has dependencies on some versions of packages like pandas 0.7.3. Installing these older versions will remove newer versions and create conflicts with existing code. So a better option is to create a virtual environment with the specific package versions only. For example, QSTK does not work with Python 3 or pandas 0.21. It only works with python2.7 and pandas 0.7.3. So we have to create a virtual environment and install these versions. virtualenv --python=/usr/bin/python2.7 ~/python2.7-virtual-env This will create the ~/python2.7-virtual-env directory if it doesn’t exist, and also create directories inside it containing a copy of the Python interpreter, the standard library, and various supporting files. Now go to that directory and run source activate to start a new environment (just like a chroot environment). source ~/python2.7-virtual-env/bin/activate This will start a new envi...

Batch rotate video files

I like to capture videos of scenery while I am driving. I do that with the help of my phone mounted on a car cell phone holder. The phone keeps recording and later I edit out the boring parts. Now, this works very well when the phone is already mounted in the horizontal position when the recording starts and as well when it ends. However, if for any reason I pick up the phone anytime during recording the video gets saved in a vertical format. So I have to rotate those videos a lot. They can also be rotated in a video editing software like Kdenlive by applying the "Rotate" effect. However, then the video gets chopped off from the sides. I find it easier to rotate the videos before importing them to the video editing software. Since a lot of videos might end up being vertical when they should be horizontal I have to rotate a lot of videos. So I wrote a batch script to rotate a bunch of videos at once. The script runs on every mp4 file in the directory. The script needs to be ...

Moving Android Studio from home to system folders

Android Studio takes up a lot of space with its Sdks and virtual devices. The home folder gets full quickly. One option would be to move the entire Android Studio suite to its own folder in one of the system folders. We will use /opt folder but / usr /local can also be used (I used that before). First, create the directory structure at /opt $ su $ cd /opt $ mkdir Android The /opt/Android folder will hold the Android Studio, the Sdks, and the AVDs. $ mv /home/me/android-studio /opt/Android $ mv /home/me/Sdk /opt/Android The AVD folders are trickier as they are in the hidden folders in the user's’ home directory. $ cd /home/me/.android $ mv avd /opt/Android Now we need to link the moved folders so that Android Studio can access them. 1. Link the AVD folder as normal user Open another terminal as the regular user $ cd ~/.android $ ln -s /opt/Android/avd avd Verify that the link points to the right directory with $ ls -l lrwxrwxrwx 1 me...

Identifying delimiter of a CSV file

The following one-liner can be used to extract the delimiter of a CSV file. This command does not work on TAB separated files. It only works on delimited files whose field separators are not whitespaces. $ head - n1 bookmerged . csv | tr - d '[a-z][A-Z][0-9]' | \ tr -d '"' | sed 's/.\{1\}/&\n/g' | sort - r | uniq - c | \ sort - nr | tr - s " " | cut - d " " - f3 | head - n1 This command generates a list of special characters and from that list selects the character with the highest frequency of occurrence. This character must be the delimiter of the file unless some other special character is used heavily. This code will fail when other special characters have a higher frequency of occurrence than the delimiter. An explanation of this code is as follows. After head grabs the column headers, the first two trace commands (tr) removes all alphabets, numbers, and quotes. This leaves a bunch of s...

Swap columns of CSV file from Linux terminal

Swapping columns is an integral part of data analysis. And with GUI spreadsheet programs it is simply a four-step process. Suppose ColumnA and ColumnB need to be swapped. Then the follwing sequence does the job. Create a new column before ColumnA Cut ColumnB into this new column Cut ColumnA to the location of ColumnB Delete empty column However, for massive databases, the spreadsheet program is neither adequate nor recommended. The software will take a long time to load the file, maybe even stall in the process of loading the large database. A simpler solution will be to use AWK to swap the columns of the database. This method is extremely fast and efficient. A typical AWK command to rearrange the columns of a database will look like awk - F ',' 'BEGIN{OFS=",";} {print $1, $5, $3, $4, $2}' test . csv This command rearranges column 2 with column 8. This command is simple and elegant. But it has its drawbacks. The user needs to type all th...

Testing Central Limit Theorem with R

In this article, we will verify the Central Limit Theorem which says that a distribution of sample means of samples from a distribution of a random variable approaches that of a normal distribution with increasing sample size. Put simply, if multiple samples are taken from a distribution (normal or otherwise) and the mean of the samples are computed then the collection of sample means hence generated will itself form a distribution and that distribution will be the Normal Distribution (provided the sample size is large). One corollary of the Central Limit Theorem is that the sample mean will approach the population mean as the sample size goes to infinity (or the population limit). One way to verify this statement is to do the sampling using random variables generated by R and then calculate the sample means for each set of random numbers. Using R we will generate a sample of N normal random numbers and repeat that sampling 20 times each time finding the mean of the sample of the ...