Skip to main content

Posts

Showing posts from November, 2014

A script to fetch bibliography list from inspirehep by parsing through the latex file citing the references

While writing short scientific papers a major workload is in sorting the reference list. A professional article requires that the references be sorted in the bibliography list in the order that they appear in the main text (and not alphabetically which most LaTeX bst files do). This is a manual task which is laborious and needs to be repeated every time a reference is added. Therefore, it is instructive to write a script which will automate this tedious task. The following script that I wrote will look for the citation tags for the references and then fetch the references straight from the server at inspirehep.net and prepare a reference file for the user to copy. Not only do we no longer need to copy and paste the references manually but the script will also sort the references in the order of their appearance.

An example of a citation appearing in the main text

... as it was shown in \cite{Chatterjee:2013daa} that merging black holes may not violate the second law of thermodynamics …

sed one liners explained

1. To select blocks of text and select only the first block (match)

sed -n '/PATTERN START/,/PATTERN END/p'

-n suppress multiple prints

or

sed ''/PATTERN START/,/PATTERN END/!d'

had the same effect. It deletes all lines which do not fall under this REGEXP matching pattern.

But this will select ALL blocks which match. Worse if there is PATTERN START and no PATTERN END then sed will go to the EOF and print everything. This is something we do not want.

PATTERN START
......
......
PATTERN END

PATTERN START
......
......
EOF


so we have to make it quit after matching the first block.

sed ''/PATTERN START/,/PATTERN END/!d;/PATTERN END/q'

; ends the first command and sed starts executing the next set of commands which in this case is match pattern /PATTERN END/ and the command for sed is to quit on encountering this pattern which happens after matching the first block of code.

Also see: http://www.catonmat.net/blog/sed-one-liners-explained-part-three/