Skip to main content

A script to fetch bibliography list from inspirehep by parsing through the latex file citing the references

While writing short scientific papers a major workload is in sorting the reference list. A professional article requires that the references be sorted in the bibliography list in the order that they appear in the main text (and not alphabetically which most LaTeX bst files do). This is a manual task which is laborious and needs to be repeated every time a reference is added. Therefore, it is instructive to write a script which will automate this tedious task. The following script that I wrote will look for the citation tags for the references and then fetch the references straight from the server at inspirehep.net and prepare a reference file for the user to copy. Not only do we no longer need to copy and paste the references manually but the script will also sort the references in the order of their appearance.

An example of a citation appearing in the main text

... as it was shown in \cite{Chatterjee:2013daa} that merging black holes may not violate the second law of thermodynamics .....

The script looks for the "\cite" tag. It will then extract all the cite tags into a single file. This is in the order in which the citations appear in the main text. Then it will remove all duplicate references. Then from that list it will fetch all the bibliography information from inspire.net and concatenate them into a file called "bibtex". The user then will copy the contents of the file "bibtex" to the end of his/her LaTeX file and compile.

For this script to work certain command line tools to be installed -- Perl (installed by default), lynx, curl. We can install them by

sudo apt-get install lynx curl

Save the following script, make it executable, and pass the latex file as argument to it.

  1. Save as getLatex.sh
  2. chmod 755 getLatex.sh
  3. ./getLatex.sh mynewpaper.tex


The output will be saved to the file bibtexfile.

Copy the contents of the file "bibtex" to the end of his/her LaTeX file and compile.

==================================================================

# This script fetches LaTeX bibtex from a LaTeX file and put them in the order in which they appear in the original document. Instead of doing it manually this script will sort the references.

# getLatex.sh v1.1 Jones
# Usage : getLatex.sh <filename>

echo "This script fetches LaTeX bibtex from a LaTeX file and put them in the order in which they appear in the original document. Instead of doing it manually this script will sort the references."

# check if lynx is installed
if [ `which lynx` == "" ] ; then 
   echo -e " <lynx> not installed. Install <lynx> for this script to work.\n  sudo apt-get install lynx.\n  Exiting ..."
   exit 1
fi
# check if curl is installed
if [ `which curl` == "" ] ; then 
   echo -e "  <curl> not installed. Install <curl> for this script to work.\n  sudo apt-get install curl.\n  Exiting ..."
   exit 1
fi

# check if user provided filename otherwise exit
if [ "$1" == "" ] ; then
   echo -e "  Did not supply a file name. \n  Usage: getLatex.sh <filename>.\n  Exiting ..." 
   exit
fi

# check if file exists
if [ ! -f $1 ]; then
    echo -e "  File not found.\n  Exiting ..."
    exit
fi

# begin by cleaning temporary files

for ff in .temp .temp2 .temp3 .temp4 .temp5 .temp6
do
if [ -f $ff ]; then
    rm $ff
fi
done

cat $1 | grep cite | sed 's/}/\n/g' | sed 's/cite/\ncite/g' | grep cite | sed 's/cite{//g' | sed 's/,/\n/g' | sed 's/ //g'  | perl -ne 'if (!defined $x{$_}) { print $_; $x{$_} = 1; }' > .temp


#cat $1 | grep ":" | grep -v "ARXIV"  | grep -v "=" | sed 's/@article{//' | sed 's/,//' | sed 's/}//' | sed 's/\.//' | sed "s/:/%3A/g" | sort -u > .temp


# see if the above code worked. If it did not then the format of the reference list in the file is wrong or the file does not contain references.
if [[ `cat .temp` == "" ]] ; then
  echo -e "  Enter the references in correct format in the file: $1. They should have the tag \\\cite.\n  Exiting ..."
  exit
fi

for i in `cat .temp`
do
echo "Getting --[ $i ]--"
#echo "http://inspirehep.net/search?ln=en&ln=en&p=$i&of=hb&action_search=Search&sf=&so=d&rm=&rg=25&sc=0" 
curl -# "http://inspirehep.net/search?ln=en&ln=en&p=$i&of=hb&action_search=Search&sf=&so=d&rm=&rg=25&sc=0" > .temp2
link2=`cat .temp2 | grep "LaTeX(US)"  | grep record | cut -d '"' -f2 | sed "s|http://inspirehep.net||" `

lynx -dump "http://inspirehep.net$link2" > .temp4
cat .temp4 | sed -n '/cite/,/HEP :: /p' > .temp5
cat .temp5 | grep -v "HEP" >> .temp6
echo "==========="
#exit
done
mv .temp6 bibtexfile

# end by cleaning temporary files

for ff in .temp .temp2 .temp3 .temp4 .temp5
do
if [ -f $ff ]; then
    rm $ff
fi
done

echo "  File written to bibtexfile."
=======================================================================

Note: This script will fetch only LaTeX bibliography entries. To fetch bibtex entries we need another script which I will publish next. However, using bibtex defeats the purpose of sorting the references in the order which they appear in the text because the references would be auto-sorted by the bibtex bst file. Most natbib bst files sorts the references alphabetically. Notable exception is the h-physrev.bst from arXiv.org which will sort the references in the order they appear in the document. So if we are using bibtex and want our references sorted in the order of their appearance in the main text we should use the h-physrev.bst file. But for small articles I prefer using LaTeX references rather than bibtex for simplicity. This is where this script comes in handy. However, script for fetching the bibtex is advantegous in its own right. Because while writing a thesis, we do not need to worry about manually copying the bibtex formatted references. We have to include our pre-written papers and run that script which will fetch the bibtex entries for the master.bib file. 

Comments

Popular posts from this blog

LYRICS OF CHANDRABINDOO

___________________________________________________________________ SWEET HEART FROM AAR JAANI NAA(T-SERIES) -- SWEETHEART -- Pratham college-er din ta Aajo thik e mone poRey scene ta Dada didi haath dhorey siNRi tei bose poRey Aamar chokh ta ghorey bon bon bon bon Sweetheart, I am seating alone Sweetheart, for me there is none DhoNk gile chole gelo pratham maas Meye dekhlei feli deergho-shwash DhoNk gile chole gelo pratham maas Meye dekhlei othe nabhishwash Meyera bheeshan smart poRey chhoto mini-skirt Aamar e je sheet korey kon kon kon kon Sweetheart, I am seating alone Sweetheart, for me there is none Taarporey kete gelo maas chaar Fuse holo je kato future Bandhura purse khule eke oke taake tole Aamar pran ta korey chon mon chon mon Sweetheart, I am seating alone Sweetheart, for me there is none Ekdin lawn theke beriye Ek tanayaar dike taakiye Hawt korey ki je holo magaj ta ghurey gelo Taar kaaner saamne kori ghyan ghyan ghyan ghyan Sweetheart, I am seating alone Sweethea...

Fastest way to send multiple drafts from gmail

People claim that the fastest way to send multiple email drafts is to use Gmail IMAP with email client like Outlook or Evolution or Thunderbird. But I have found this is not true. Because Thunderbird and Evolution etc. email clients treats the drafts as emails still to be edited. So it is not just simple select all and hit send. Each email draft has to be opened and sent separately. That is a lot of clicks and mouse movements, wasting precious time and energy. I have a better solution which involves minimum keystrokes and mouse usage. Efficiency booster technique for sending emails. If someone is feeling adventurous and want to try it from the Gmail interface itself, here's how to do it in the fastest possible manner. It involves using the mouse once. Select the first draft. Gmail would open a new email box and put the cursor inside the box to write. Press TAB once to go the Send button. Press ENTER to send. Now Gmail sends it and the box is gone but the highlight goes to the last...

Changing the font size of section headings in LaTex

You have several ways to do so: 1.- A direct redefinition of \section: \makeatletter \renewcommand\section{\@startsection{section}{1}{\z@}%                                   {-3.5ex \@plus -1ex \@minus -.2ex}%                                   {2.3ex \@plus.2ex}%                                   {\normalfont\large\bfseries}} \makeatother 2.- By means of the titlesec package: \usepackage{titlesec} \titleformat{\section}{\large\bfseries}{\thesection}{1em}{} 3.- By means of the sectsty package: \usepackage{sectsty} \sectionfont{\large} source : http://www.latex-community.org/forum/viewtopic.php?f=4&t=3245   Now, I would explain the titlesec package a bit more (because it seems easier to me and...