Skip to main content

Posts

Identifying delimiter of a CSV file

The following one-liner can be used to extract the delimiter of a CSV file. This command does not work on TAB separated files. It only works on delimited files whose field separators are not whitespaces.

$ head-n1bookmerged.csv|tr-d'[a-z][A-Z][0-9]' | \tr -d '"' |sed's/.\{1\}/&\n/g'|sort-r|uniq-c| \sort-nr|tr-s" "|cut-d" "-f3|head-n1
This command generates a list of special characters and from that list selects the character with the highest frequency of occurrence. This character must be the delimiter of the file unless some other special character is used heavily. This code will fail when other special characters have a higher frequency of occurrence than the delimiter. An explanation of this code is as follows.

After head grabs the column headers, the first two trace commands (tr) removes all alphabets, numbers, and quotes. This leaves a bunch of special characters among which the character with the highest frequency of occurrenc…
Recent posts

Swap columns of CSV file from Linux terminal

Swapping columns is an integral part of data analysis. And with GUI spreadsheet programs it is simply a four-step process. Suppose ColumnA and ColumnB need to be swapped. Then the follwing sequence does the job.
Create a new column before ColumnACut ColumnB into this new columnCut ColumnA to the location of ColumnBDelete empty column However, for massive databases, the spreadsheet program is neither adequate nor recommended. The software will take a long time to load the file, maybe even stall in the process of loading the large database. A simpler solution will be to use AWK to swap the columns of the database. This method is extremely fast and efficient. A typical AWK command to rearrange the columns of a database will look like

awk-F',''BEGIN{OFS=",";} {print $1, $5, $3, $4, $2}'test.csv
This command rearranges column 2 with column 8. This command is simple and elegant. But it has its drawbacks. The user needs to type all the column numbers by hand, which …

Testing Central Limit Theorem with R

In this article, we will verify the Central Limit Theorem which says that a distribution of sample means of samples from a distribution of a random variable approaches that of a normal distribution with increasing sample size. Put simply, if multiple samples are taken from a distribution (normal or otherwise) and the mean of the samples are computed then the collection of sample means hence generated will itself form a distribution and that distribution will be the Normal Distribution (provided the sample size is large). One corollary of the Central Limit Theorem is that the sample mean will approach the population mean as the sample size goes to infinity (or the population limit). One way to verify this statement is to do the sampling using random variables generated by R and then calculate the sample means for each set of random numbers.

Using R we will generate a sample of N normal random numbers and repeat that sampling 20 times each time finding the mean of the sample of the 20 …

Convert file listing to database format

Let us say we have a collection of ebooks or papers/articles sorted in various folders and we want to create a database (or spreadsheet) of those papers or books so that we can add comments or notes next to them.

For example, let us say we have a file structure like (find . type f)

./entanglement-entropy-holography/1006.1263.pdf
./entanglement-entropy-holography/0912.1877.pdf
./entanglement-entropy-holography/0911.3160v2.pdf
./entanglement-entropy-holography/0912.1877v2.pdf
./entanglement-entropy-holography/1010.1682.pdf
./graviton-propagator/zee-1979-PhysRevLett.42.417.pdf ./graviton-propagator/dewitt-3-PhysRev.162.1239.pdf ./graviton-propagator/dewitt-2-PhysRev.162.1195.pdf ./graviton-propagator/dewitt-1-PhysRev.160.1113.pdf ./SUSY/Piguet-9710095v1.pdf ./SUSY/Olive_susy_9911307v1.pdf ./SUSY/sohnius-introducing-susy-1985.pdf ./SUSY/khare-cooper-susy-qm-phys.rept-1995.pdf ./SUSY/Instantons Versus Supersymmetry9902018v2.pdf
and we want this list to be converted to a database format.

Ar…

List files with absolute pathname in linux

ls -d $PWD/*

$PWD/* expands the absolute path of the present working directory and appends the directory listing of * to it.

ls displays that list while -d prevents ls from going into each directory in that list and recursively listing all sub-directories.

We can also print filelist of all sub-directories relative to current directory.

find . -type f


Ripping videos from DVD using Handbrake

We will use Handbrake to save DVD videos on an external hard disk. A smart TV can play the videos when the hard disk is connected via the USB port. A word of caution - only output to mp4 format. TVs usually do not support many formats but most of them support mp4. Here is the procedure to rip the DVD videos.

1. Insert DVD

2. Open Handbrake and Select "Source"



3. Select the DVD disc

Handbrake will start analyzing the DVD file for video content. If it doesn't’ (sometimes it takes a few tries) then repeat above step and select the DVD drive as source again. It will look like this while it is scanning



Once it is done scanning we will begin the process of extracting the videos (titles). Now this DVD has 6 episodes. We will select each track, apply a video profile to it, and then enqueue it to the encoding queue. The video profile selects the encoding parameters which produce the optimal video file of the smallest size without compromising on the quality.
When extracting f…

Use apple DVD drive (superdrive) in Linux

Attach DVD drive

See if it is recognized

$ dmesg | tail
[357251.566256] scsi 8:0:0:0: CD-ROM            Apple    SuperDrive       2.00 PQ: 0 ANSI: 0
[357251.594859] sr 8:0:0:0: [sr0] scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
[357251.594867] cdrom: Uniform CD-ROM driver Revision: 3.20
[357251.595222] sr 8:0:0:0: Attached scsi CD-ROM sr0
[357251.595424] sr 8:0:0:0: Attached scsi generic sg1 type 5

Now check the dev folder to see what device id is assigned to the apple drive

$ cd /dev/
$ ls -l sr*
brw-rw----+ 1 root cdrom 11, 0 May 22 20:49 sr0

See if the device showed up as sr0. If the drive is identified as sr1 (in case you have another DVD drive plugged in already) then replace sr0 by sr1 in the following commands.

Make sure sg_raw is installed ($ which sg_raw). If it is not installed then install it first by

$ sudo apt-get install sg3-utils

Use the following command to send bytes to the drive which will initialize it.

$ sg_raw /dev/sr0 EA 00 00 00 00 00 01

Insert a di…