Skip to main content

Convert file listing to database format

Let us say we have a collection of ebooks or papers/articles sorted in various folders and we want to create a database (or spreadsheet) of those papers or books so that we can add comments or notes next to them.

For example, let us say we have a file structure like (find . type f)

./entanglement-entropy-holography/1006.1263.pdf
./entanglement-entropy-holography/0912.1877.pdf
./entanglement-entropy-holography/0911.3160v2.pdf
./entanglement-entropy-holography/0912.1877v2.pdf
./entanglement-entropy-holography/1010.1682.pdf
./graviton-propagator/zee-1979-PhysRevLett.42.417.pdf
./graviton-propagator/dewitt-3-PhysRev.162.1239.pdf
./graviton-propagator/dewitt-2-PhysRev.162.1195.pdf
./graviton-propagator/dewitt-1-PhysRev.160.1113.pdf
./SUSY/Piguet-9710095v1.pdf
./SUSY/Olive_susy_9911307v1.pdf
./SUSY/sohnius-introducing-susy-1985.pdf
./SUSY/khare-cooper-susy-qm-phys.rept-1995.pdf
./SUSY/Instantons Versus Supersymmetry9902018v2.pdf

and we want this list to be converted to a database format.


ArticleTypeNotes
1006.1263.pdfentanglement-entropy-holography
0912.1877.pdfentanglement-entropy-holography
0911.3160v2.pdfentanglement-entropy-holography
0912.1877v2.pdfentanglement-entropy-holography
1010.1682.pdfentanglement-entropy-holography
zee-1979-PhysRevLett.42.417.pdfgraviton-propagator
dewitt-3-PhysRev.162.1239.pdfgraviton-propagatorDifficult
dewitt-2-PhysRev.162.1195.pdfgraviton-propagatorDifficult
dewitt-1-PhysRev.160.1113.pdfgraviton-propagatorDifficult
Piguet-9710095v1.pdfSUSY
Olive_susy_9911307v1.pdfSUSY
sohnius-introducing-susy-1985.pdfSUSY
khare-cooper-susy-qm-phys.rept-1995.pdfSUSY
Instantons Versus Supersymmetry9902018v2.pdfSUSYRandom comment

The last column is added by the user after the data is imported. In order to import the data in the above format, we need the directory name (TYPE) and the FILENAME to be reversed and printed as columns separated by TAB. We can use any other delimiter but with TAB as the delimiter of columns, a spreadsheet program will automatically split the imported columns into two columns. 


$ find . -type f -print | sed -r 's|(.*)\/|\1+|'  | awk -F"+" '{print $2"\t"$1}' | sed 's|\.\/||'



The find command lists all files and pipes it to sed which then replaces the last forward slash (/) with a +. This replacement allows awk to operate on this location (+) and splice the string into two - the first part is the TYPE and the second part is the FILENAME. awk then switches the order of the fields TYPE and FILENAME and puts a TAB in between the fields. Now a simple copy-paste of the output to a spreadsheet program will automatically sort the two fields into two different columns. 

Detailed explanation:

find . -type f 

selects only files recursively from all sub-directories

sed -r 's|(.*)\/|\1+|'  

-r indicates REGEX(regular expression) to be used in pattern matching

| delimiter is used instead of the conventional / to avoid confusion while replacing the / in the strings. 

(.*)\/ selects everything up to the last forward slash (/) (sed is a greedy pattern matcher). 

(.*) is stored in \1 is put back while the forward slash (/) is replaced by +.

awk -F"+" '{print $2"\t"$1}'

-F sets the input field separator to be + so that awk can splice the input string at the location of the +, which is conveniently inserted at the location of the last forward slash (/) by the previous sed operation. 

'{print $2"\t"$1}' prints column 2, TAB, and column 1 in that order, effectively interchanging the columns and inserting a TAB between them.

The output will look like this



$ find . -type f -print | sed -r 's|(.*)\/|\1+|'  | awk -F"+" '{print $2"\t"$1}' | sed 's|\.\/||'

1006.1263.pdf entanglement-entropy-holography 
0912.1877.pdf entanglement-entropy-holography 
0911.3160v2.pdf entanglement-entropy-holography 
0912.1877v2.pdf entanglement-entropy-holography 
1010.1682.pdf entanglement-entropy-holography 
zee-1979-PhysRevLett.42.417.pdf graviton-propagator 
dewitt-3-PhysRev.162.1239.pdf graviton-propagator Difficult
dewitt-2-PhysRev.162.1195.pdf graviton-propagator Difficult
dewitt-1-PhysRev.160.1113.pdf graviton-propagator Difficult
Piguet-9710095v1.pdf SUSY 
Olive_susy_9911307v1.pdf SUSY 
sohnius-introducing-susy-1985.pdf SUSY 
khare-cooper-susy-qm-phys.rept-1995.pdf SUSY 
Instantons Versus Supersymmetry9902018v2.pdf SUSY

Comments

Popular posts from this blog

Fastest way to send multiple drafts from gmail

People claim that the fastest way to send multiple email drafts is to use Gmail IMAP with email client like Outlook or Evolution or Thunderbird. But I have found this is not true. Because Thunderbird and Evolution etc. email clients treats the drafts as emails still to be edited. So it is not just simple select all and hit send. Each email draft has to be opened and sent separately. That is a lot of clicks and mouse movements, wasting precious time and energy. I have a better solution which involves minimum keystrokes and mouse usage. Efficiency booster technique for sending emails. If someone is feeling adventurous and want to try it from the Gmail interface itself, here's how to do it in the fastest possible manner. It involves using the mouse once. Select the first draft. Gmail would open a new email box and put the cursor inside the box to write. Press TAB once to go the Send button. Press ENTER to send. Now Gmail sends it and the box is gone but the highlight goes to the last

LYRICS OF CHANDRABINDOO

___________________________________________________________________ SWEET HEART FROM AAR JAANI NAA(T-SERIES) -- SWEETHEART -- Pratham college-er din ta Aajo thik e mone poRey scene ta Dada didi haath dhorey siNRi tei bose poRey Aamar chokh ta ghorey bon bon bon bon Sweetheart, I am seating alone Sweetheart, for me there is none DhoNk gile chole gelo pratham maas Meye dekhlei feli deergho-shwash DhoNk gile chole gelo pratham maas Meye dekhlei othe nabhishwash Meyera bheeshan smart poRey chhoto mini-skirt Aamar e je sheet korey kon kon kon kon Sweetheart, I am seating alone Sweetheart, for me there is none Taarporey kete gelo maas chaar Fuse holo je kato future Bandhura purse khule eke oke taake tole Aamar pran ta korey chon mon chon mon Sweetheart, I am seating alone Sweetheart, for me there is none Ekdin lawn theke beriye Ek tanayaar dike taakiye Hawt korey ki je holo magaj ta ghurey gelo Taar kaaner saamne kori ghyan ghyan ghyan ghyan Sweetheart, I am seating alone Sweethea

Changing the font size of section headings in LaTex

You have several ways to do so: 1.- A direct redefinition of \section: \makeatletter \renewcommand\section{\@startsection{section}{1}{\z@}%                                   {-3.5ex \@plus -1ex \@minus -.2ex}%                                   {2.3ex \@plus.2ex}%                                   {\normalfont\large\bfseries}} \makeatother 2.- By means of the titlesec package: \usepackage{titlesec} \titleformat{\section}{\large\bfseries}{\thesection}{1em}{} 3.- By means of the sectsty package: \usepackage{sectsty} \sectionfont{\large} source : http://www.latex-community.org/forum/viewtopic.php?f=4&t=3245   Now, I would explain the titlesec package a bit more (because it seems easier to me and with more options) : To change the section fonts with this package put the following lines in the preamble - \usepackage{titlesec} \titleformat{\ section }{\ large \ bfseries }{\thesection}{1em}{} Options available are- a> Font size - \normals