I was given a long word document which I have to port to Latex. In the document all citations appear in the classic form with author and year. Something like
Lorem ipsum dolor (Sit, 1998) amet, consectetur adipiscing (Slit 2000, Sed and So 2002, Eiusmod et al. 1976).
Tempor incididunt ut labore et dolore magna aliqua (Ut et al. 1312)
This references need to get the proper key reference as it appears in a list of bib references. In other words the text should translate to
Lorem ipsum dolor \cite{sit1998} amet, consectetur adipiscing \cite{slit2000,sed2002,eiusmod1976}.
Tempor incididunt ut labore et dolore magna aliqua \cite{ut1312}
That means:
- extract all the strings that are composed of name(s) and year enclosed in parentheses
- strip that string of spaces, second names (everything after the first name) and capital letters
- use the resulting string to form the new \cite{string}
I understand that this may be quite a complex task. I was wondering maybe someone has written a script fo this specific task. Alternatively any partial suggestion is also welcome. I am currently working in MacOS.
Best Answer
The following
awk
program should work. It looks for( ... )
elements in each line and checks if they fit the "author(s), year" or "author(s)1 year1, author(s)2 year2, ..." pattern. If so, it creates a citation command and replaces the( ... )
group; otherwise it leaves the group as it is.Call the program with
Note that the program expects the input to be "reasonably" well-formed, i.e. all parentheses on a line should be matched pairs (although one closing parentheses at the beginning of the line is allowed for, and unmatched opening parentheses will be ignored).
Note also that some of the syntax above requires GNU awk. To be portable to other implementations, replace
with
and ensure you have set the collation locale to
C
.