Skip to main content

A script to fetch bibliography list from inspirehep by parsing through the latex file citing the references

While writing short scientific papers a major workload is in sorting the reference list. A professional article requires that the references be sorted in the bibliography list in the order that they appear in the main text (and not alphabetically which most LaTeX bst files do). This is a manual task which is laborious and needs to be repeated every time a reference is added. Therefore, it is instructive to write a script which will automate this tedious task. The following script that I wrote will look for the citation tags for the references and then fetch the references straight from the server at inspirehep.net and prepare a reference file for the user to copy. Not only do we no longer need to copy and paste the references manually but the script will also sort the references in the order of their appearance.

An example of a citation appearing in the main text

... as it was shown in \cite{Chatterjee:2013daa} that merging black holes may not violate the second law of thermodynamics .....

The script looks for the "\cite" tag. It will then extract all the cite tags into a single file. This is in the order in which the citations appear in the main text. Then it will remove all duplicate references. Then from that list it will fetch all the bibliography information from inspire.net and concatenate them into a file called "bibtex". The user then will copy the contents of the file "bibtex" to the end of his/her LaTeX file and compile.

For this script to work certain command line tools to be installed -- Perl (installed by default), lynx, curl. We can install them by

sudo apt-get install lynx curl

Save the following script, make it executable, and pass the latex file as argument to it.

  1. Save as getLatex.sh
  2. chmod 755 getLatex.sh
  3. ./getLatex.sh mynewpaper.tex


The output will be saved to the file bibtexfile.

Copy the contents of the file "bibtex" to the end of his/her LaTeX file and compile.

==================================================================

# This script fetches LaTeX bibtex from a LaTeX file and put them in the order in which they appear in the original document. Instead of doing it manually this script will sort the references.

# getLatex.sh v1.1 Jones
# Usage : getLatex.sh <filename>

echo "This script fetches LaTeX bibtex from a LaTeX file and put them in the order in which they appear in the original document. Instead of doing it manually this script will sort the references."

# check if lynx is installed
if [ `which lynx` == "" ] ; then 
   echo -e " <lynx> not installed. Install <lynx> for this script to work.\n  sudo apt-get install lynx.\n  Exiting ..."
   exit 1
fi
# check if curl is installed
if [ `which curl` == "" ] ; then 
   echo -e "  <curl> not installed. Install <curl> for this script to work.\n  sudo apt-get install curl.\n  Exiting ..."
   exit 1
fi

# check if user provided filename otherwise exit
if [ "$1" == "" ] ; then
   echo -e "  Did not supply a file name. \n  Usage: getLatex.sh <filename>.\n  Exiting ..." 
   exit
fi

# check if file exists
if [ ! -f $1 ]; then
    echo -e "  File not found.\n  Exiting ..."
    exit
fi

# begin by cleaning temporary files

for ff in .temp .temp2 .temp3 .temp4 .temp5 .temp6
do
if [ -f $ff ]; then
    rm $ff
fi
done

cat $1 | grep cite | sed 's/}/\n/g' | sed 's/cite/\ncite/g' | grep cite | sed 's/cite{//g' | sed 's/,/\n/g' | sed 's/ //g'  | perl -ne 'if (!defined $x{$_}) { print $_; $x{$_} = 1; }' > .temp


#cat $1 | grep ":" | grep -v "ARXIV"  | grep -v "=" | sed 's/@article{//' | sed 's/,//' | sed 's/}//' | sed 's/\.//' | sed "s/:/%3A/g" | sort -u > .temp


# see if the above code worked. If it did not then the format of the reference list in the file is wrong or the file does not contain references.
if [[ `cat .temp` == "" ]] ; then
  echo -e "  Enter the references in correct format in the file: $1. They should have the tag \\\cite.\n  Exiting ..."
  exit
fi

for i in `cat .temp`
do
echo "Getting --[ $i ]--"
#echo "http://inspirehep.net/search?ln=en&ln=en&p=$i&of=hb&action_search=Search&sf=&so=d&rm=&rg=25&sc=0" 
curl -# "http://inspirehep.net/search?ln=en&ln=en&p=$i&of=hb&action_search=Search&sf=&so=d&rm=&rg=25&sc=0" > .temp2
link2=`cat .temp2 | grep "LaTeX(US)"  | grep record | cut -d '"' -f2 | sed "s|http://inspirehep.net||" `

lynx -dump "http://inspirehep.net$link2" > .temp4
cat .temp4 | sed -n '/cite/,/HEP :: /p' > .temp5
cat .temp5 | grep -v "HEP" >> .temp6
echo "==========="
#exit
done
mv .temp6 bibtexfile

# end by cleaning temporary files

for ff in .temp .temp2 .temp3 .temp4 .temp5
do
if [ -f $ff ]; then
    rm $ff
fi
done

echo "  File written to bibtexfile."
=======================================================================

Note: This script will fetch only LaTeX bibliography entries. To fetch bibtex entries we need another script which I will publish next. However, using bibtex defeats the purpose of sorting the references in the order which they appear in the text because the references would be auto-sorted by the bibtex bst file. Most natbib bst files sorts the references alphabetically. Notable exception is the h-physrev.bst from arXiv.org which will sort the references in the order they appear in the document. So if we are using bibtex and want our references sorted in the order of their appearance in the main text we should use the h-physrev.bst file. But for small articles I prefer using LaTeX references rather than bibtex for simplicity. This is where this script comes in handy. However, script for fetching the bibtex is advantegous in its own right. Because while writing a thesis, we do not need to worry about manually copying the bibtex formatted references. We have to include our pre-written papers and run that script which will fetch the bibtex entries for the master.bib file. 

Comments

Popular posts from this blog

Fastest way to send multiple drafts from gmail

People claim that the fastest way to send multiple email drafts is to use Gmail IMAP with email client like Outlook or Evolution or Thunderbird. But I have found this is not true. Because Thunderbird and Evolution etc. email clients treats the drafts as emails still to be edited. So it is not just simple select all and hit send. Each email draft has to be opened and sent separately. That is a lot of clicks and mouse movements, wasting precious time and energy. I have a better solution which involves minimum keystrokes and mouse usage. Efficiency booster technique for sending emails.
If someone is feeling adventurous and want to try it from the Gmail interface itself, here's how to do it in the fastest possible manner. It involves using the mouse once. Select the first draft. Gmail would open a new email box and put the cursor inside the box to write.Press TAB once to go the Send button.Press ENTER to send. Now Gmail sends it and the box is gone but the highlight goes to the last li…

LYRICS OF CHANDRABINDOO

___________________________________________________________________

SWEET HEART FROM AAR JAANI NAA(T-SERIES)
-- SWEETHEART --

Pratham college-er din ta
Aajo thik e mone poRey scene ta
Dada didi haath dhorey siNRi tei bose poRey
Aamar chokh ta ghorey bon bon bon bon

Sweetheart, I am seating alone
Sweetheart, for me there is none

DhoNk gile chole gelo pratham maas
Meye dekhlei feli deergho-shwash
DhoNk gile chole gelo pratham maas
Meye dekhlei othe nabhishwash
Meyera bheeshan smart poRey chhoto mini-skirt
Aamar e je sheet korey kon kon kon kon

Sweetheart, I am seating alone
Sweetheart, for me there is none

Taarporey kete gelo maas chaar
Fuse holo je kato future
Bandhura purse khule eke oke taake tole
Aamar pran ta korey chon mon chon mon

Sweetheart, I am seating alone
Sweetheart, for me there is none

Ekdin lawn theke beriye
Ek tanayaar dike taakiye
Hawt korey ki je holo magaj ta ghurey gelo
Taar kaaner saamne kori ghyan ghyan ghyan ghyan

Sweetheart, I am seating alone
Sweetheart, for me there is none

Taarpore cla…

How to join audio tracks smoothly in Audacity

Audio tracks can be mixed easily in full fledged video editors like kdenlive or Adobe Premiere but for mixing or joining audio they would not provide the amount of control that audio editors like Audacity provide. The example I am going to show today would demonstrate the range of tools in Audacity that allows us to control the waveform and to mold it to our desire.

Today I am going to take a small track like a small 1 min. clip of storm sound and make a long continuous sounding storm audio clip from it.

STEP 1 : Open file
Open the clip in Audacity.

STEP 2 : Removing unwanted audio portion
There is always unwanted sections of the audio that we have to discard. Select the portion of background noise or unwanted audio.


Press DELETE key.

STEP 3 : Duplicating the track
Now we make a copy of the entire waveform and paste it at the end of the track.

Select the SELECT tool (the one which looks like I). Then select the whole waveform and prss Ctrl+c. Now click approximately at the end of the…