eBib-Biblio interface, aka. Mendeley for Emacs

Motivation

I do a lot of bibliographic research, both for actual publications (papers, lecture notes, books), as well as for personal use. I have come to appreciate having a database of references well organiced, specially around a topic I am well interested in.

Until recently I have been using Mendeley. However, it has become rather uncomfortable in the recent past, specially if you want to keep multiple computers synchronized without using their cloud account. Moreover, it had problems when exporting to Bibtex: nested folders gave rise to duplicate entries.

For this reason I have moved to pure Bibtex databases. I keep one for each focus interest. I use ebib, an Emacs package, to navigate and edit those databases. When writing manuscripts, I also use AucTeX (Emacs' big LaTeX mode) and its wonderful RefTeX package.

The only thing I was missing until now is having a link from the browser to ebib/AucTeX/Emacs. Something like what Mendeley and Zotero provide: you open the web page of a publication, click a bookmarklet and the article is automagically added to your databse.

I know that some larger frameworks, such as org-ref provide such a facility, but it is tied to the use of org-mode and a very particular configuration that does not interact with ebib.

My solution

I have created a personal tool that links my browser to Emacs, allowing me to download the BibTeX entry associated to a manuscript I am looking at, edit it and incorporate it to a database that is currently opened. The tool links three other libraries

  • ebib for managing a bibtex database,
  • biblio for querying information about manuscripts by DOI,
  • org-protocol for allowing Emacs to intercept messages from your browser.

It is not particularly easy to set up, specially if you work in Windows, but here are the instructions and the file.

Step 0: install dependencies

Make sure you have ebib, biblio and org-protocol installed in your system. They can be downloaded by typing M-x (i.e. Alt key pressed with x) and then package-install. Select those packages from the list.

Step 1: download source code

You can download the source code for this library here. This file is automatically generated from the org-mode file that creates this page. You can see the actual code in the following sections.

Step 2: register a new protocol

You need to tell the operating system that it has to call Emacs whenever it finds a link that begins by org-protocol. Under windows you need to download a registry profile such as the one below.

REGEDIT4

[HKEY_CLASSES_ROOT\org-protocol]
@="URL:Org Protocol"
"URL Protocol"=""
[HKEY_CLASSES_ROOT\org-protocol\shell]
[HKEY_CLASSES_ROOT\org-protocol\shell\open]
[HKEY_CLASSES_ROOT\org-protocol\shell\open\command]
@="emacsclientw.exe \"%1\""

If needed, edit the file so that the file name emacsclient.exe reflects the location of the executable you will run, and then click twice on this registry entry to install it. You may need to enter the administrator password or confirm in some way, as this is a low-level operation into the operating system.

Step 3: load my library

Assuming you rely on use-package, add this section to your .emacs file, instructing use-package where to find the utility I created.

(use-package ebib-biblio-interface
  :after (ebib biblio org-protocol)
  :load-path "location/of/the/library/")

Step 4: add a bookmarklet to your browser

Create a new bookmark with the following text in the Location: field. You can call this bookmark Save to ebib. If you use Firefox, in the Keyword: field you can enter :ebib.

javascript:location.href='org-protocol://ebib-biblio-interface?url='+encodeURIComponent(location.href)

The source code

License

;;; -*- lexical-binding: t -*-
;;;
;;; ebib-biblio-interface.el --- downloading bibtex files from webpages
;;;
;;; Copyright (C) 2020 Juan Jose Garcia-Ripoll
;;
;; All rights reserved.

;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are met:
;; 1. Redistributions of source code must retain the above copyright
;;    notice, this list of conditions and the following disclaimer.
;; 2. Redistributions in binary form must reproduce the above copyright
;;    notice, this list of conditions and the following disclaimer in the
;;    documentation  and/or other materials provided with the distribution.
;; 3. Neither the names of the copyright holders nor the names of any
;;    contributors may be used to endorse or promote products derived from
;;    this software without specific prior written permission.
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
;; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
;; IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
;; ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
;; LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
;; CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
;; SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
;; INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
;; CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
;; ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
;; POSSIBILITY OF SUCH DAMAGE.
;;
;;; Version: 0.1
;;; Author: Juan Jose Garcia-Ripoll <[email protected]>
;;; Keywords: org mode, plain text, notes, Deft, Simplenote, Notational Velocity

;; This file is not part of GNU Emacs.

(require 'org-protocol)
(require 'biblio)

Register new protocol

This section is central to the bibliographic management. It instructs org-protocol to parse a special type of URL, containing the direction of the webpage from a publication. Our handler will then

  1. Download that webpage.
  2. Parse the web page, looking for all references to manuscripts (i.e. DOI's)
  3. Collect those DOI's and gather information from CrossRef about the manuscripts
  4. Present you with a list, so that you can select which manuscript (just one) you want to add to ebib.
  5. Download the BibTeX and offer you to edit it.
  6. Once you are finished, press C-c C-c to incorporate it to your database. Alternatively, C-c C-k to abort.
;;
;; org-protocol://ebib-biblio-interface?url=encoded-url-text
;; Register Firefox bookmark with the format
;; javascript:location.href='org-protocol://ebib-biblio-interface?url='+encodeURIComponent(location.href)
;;
(defun ebib-biblio-interface-handler (fname)
  (message "Received fname %S" fname)
  (let* ((splitparts (org-protocol-parse-parameters fname nil '(:url)))
		 (uri (org-protocol-sanitize-uri (plist-get splitparts :url)))
		 dois)
	(raise-frame)
	(ebib-biblio-interface-handler--inner uri)))

(defun ebib-biblio-interface-handler--inner (uri)
  "Given the URI of a manuscript, guess its DOI or find it out in the webpage
content, and query CrossRef for that manuscript, offering the user the
possibility to save that record.

Example of use:

(ebib-biblio-interface-handler--inner \"https://www.nature.com/articles/s41598-020-63093-6\")
"
  (message "Registering bibliography item with url %s" uri)
  (let ((biblio-synchronous t))
	(cond ((string-match "http[s]*://arxiv.org/abs" uri)
		   (message "Found arXiv reference %s" uri)
		   (biblio-arxiv-lookup
			(replace-regexp-in-string "http[s]*://arxiv.org/\\(abs\\|pdf\\)/" "" uri)))
		  ((setq dois (ebib-biblio-interface-guess-doi uri))
		   (message "Found DOI %s" dois)
		   ;; When there is a single DOI, we can directly try to download
		   ;; the Bibtex record
		   (biblio-doi-forward-bibtex (biblio-cleanup-doi dois)
									  'ebib-capture-raw-bibtex))
		  ((setq dois (ebib-biblio-interface-search-dois-in-url uri))
		   (message "Found DOI's %s" dois)
		   (if (cdr dois)
			   ;; Only use CrossRef when we really have more than one DOI
			   ;; to choose from.
			   (biblio-multiple-doi-lookup dois)
			 (biblio-doi-forward-bibtex (biblio-cleanup-doi (car dois))
										'ebib-capture-raw-bibtex)))
		  (t
		   (message "No DOI's found in %s" uri))))
  ;; Return NIL to prevent any buffer from being opened
  nil)

(unless (assoc "ebib-biblio-interface" org-protocol-protocol-alist)
  (add-to-list 'org-protocol-protocol-alist
			   '("ebib-biblio-interface"
				 :protocol "ebib-biblio-interface"
				 :function ebib-biblio-interface-handler)))

Save records from Biblio to ebib

;; We bind globally the key C-i to the function that saves biblio
;; records to ebib.
(define-key biblio-selection-mode-map [?\C-i] 'ebib-biblio-interface-export-record-and-quit)

(defun ebib-biblio-interface-export-record-and-quit ()
  (interactive)
  (biblio--selection-forward-bibtex
   (lambda (entry metadata)
	 (quit-window)
	 (ebib-capture-raw-bibtex entry))))

Query DOI's in Crossref as biblio buffer

This section of the library provides a new interface for biblio to look up manuscripts in CrossRef using their DOI's. We use this interface to produce a good looking view of the manuscript.

(defun biblio-crossref-doi--url (query)
  "Create a CrossRef url to look up DOI."
  (format "http://api.crossref.org/works?filter=doi:%s%s"
		  (biblio-cleanup-doi query)
		  (if biblio-crossref-user-email-address
			  (format "&mailto=%s" (url-encode-url biblio-crossref-user-email-address)) "")))

(defun biblio-crossref-doi-backend (command &optional arg &rest more)
  "A CrossRef backend querying multiple DOI's.
COMMAND, ARG, MORE: See `biblio-backends'."
  (pcase command
	(`name "CrossRef DOI's")
	(`prompt "CrossRef DOI's: ")
	(`url (biblio-crossref-doi--url arg))
	(`parse-buffer (biblio-crossref--parse-search-results))
	(`forward-bibtex (biblio-crossref--forward-bibtex arg (car more)))
	(`register (add-to-list 'biblio-backends #'biblio-crossref-backend))))

(defun biblio-crossref-doi-lookup (&optional query)
  "Start a CrossRef search for DOI, prompting if needed."
  (interactive "MDOI: ")
  (biblio-lookup #'biblio-crossref-doi-backend query))

This interface is extended to handle multiple DOI's, as sometimes a webpage may contain references to multiple manuscripts and we need to choose one.

(defun biblio-combined-backend (command &optional arg &rest more)
  "A CrossRef backend querying multiple DOI's.
COMMAND, ARG, MORE: See `biblio-backends'."
  (pcase command
	(`name "Combined backend")
	(`prompt "Combined backend query: ")
	(`url "")
	(`parse-buffer nil)
	(`forward-bibtex nil)
	(`register nil)))

(defun biblio--lookup-n (backend-query-pairs &optional results-buffer results)
  (unless results-buffer
	(setq results-buffer
		  (biblio--make-results-buffer (current-buffer) "multiple query" 'biblio-combined-backend)))
  (if backend-query-pairs
	  (let ((backend (caar backend-query-pairs))
			(query (cdar backend-query-pairs)))
		(biblio-url-retrieve
		 (funcall backend 'url query)
		 (biblio-generic-url-callback
		  (lambda ()
			(biblio--lookup-n (rest backend-query-pairs) results-buffer
							  (append results (biblio--tag-backend backend (funcall backend 'parse-buffer))))))))
	(message "Tip: learn to browse results with `h'")
	(with-current-buffer results-buffer
	  (biblio-insert-results results "Combined results"))))

(defun biblio-multiple-doi-lookup (doi-list)
  (biblio--lookup-n (mapcar #'(lambda (x) (cons 'biblio-crossref-doi-backend x))
							doi-list)))

Collect all DOI's in a web page

Once we have this, we create a function that takes care of downloading a web page, finding all DOI's in the page and presenting the user with a list of papers, to choose which one will be saved into ebib.

(defun ebib-biblio-interface-delete-duplicates (list)
  (let ((output nil))
	(dolist (elt list output)
  (unless (member elt output)
	(push elt output)))))

(defun ebib-biblio-interface-search-dois-in-url (url)
  "Search all DOI urls or codes in a web page, using formats that
are know from journals.

Example:
(ebib-biblio-interface-search-dois-in-url \"https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.85.82\")
=> '(\"10.1103/PhysRevLett.85.82\")
"
  (with-current-buffer (url-retrieve-synchronously url)
	(message "%S" (point-min) (point-max))
	(prog1 (ebib-biblio-interface-delete-duplicates
		(or
		 ;; APS
		 (ebib-biblio-interface-collect-regexps
		  "<meta[ ]*content=\"doi:\\([^\"]+\\)\"" 1)
		 ;; SIAM
		 (ebib-biblio-interface-collect-regexps
		  "<meta[ ]*name=\"dc.Identifier\"[ ]*scheme=\"doi\"[ ]*content=\"\\([^\"]+\\)" 1)
		 ;; Quantum Journal
		 (ebib-biblio-interface-collect-regexps
		  "<meta[ ]*name=\"\\(citation_doi\\|doi\\)\"[ ]*content=\"\\([^\"]+\\)\"" 2)
		 ;; IEEE
		 (ebib-biblio-interface-collect-regexps
		  ",\"doi\":\"\\([^\"]+\\)\"" 1)
		 ;; Look for all DOI's
		 (ebib-biblio-interface-collect-regexps
		  "[\"']https?://\\(dx.\\)?doi.org/\\(?1:[^\"'& ]+\\)[\"']" 1)))
  (kill-buffer))))

(defvar ebib-biblio-interface-url-patterns
  '(;; IOP
	("iopscience.iop.org/article/\\(?1:[0-9]+\\.[0-9]+\\(/[a-zA-Z0-9-]+\\)+\\)\\(/meta\\|/full\|pdf\\)")
	("iopscience.iop.org/article/\\(?1:[0-9]+\\.[0-9]+\\(/[a-zA-Z0-9-]+\\)+\\)")
	;; Springer
	("link.springer.com/article/\\([0-9]+\\.[0-9]+/[^/]+\\)")
	;; American Physical Society
	("journals.aps.org/\\(pr[abcdex]\\|prappl\\|prx\\|prxquantum\\|rmp\\)/abstract/\\(?1:[^/]+/[^/]+\\)")
	;; Wiley Online Library
	("onlinelibrary.wiley.com/doi/full/\\(?1:[0-9]+\\.[0-9]+\\/[^/]+\\)")
	;; Frontiers
	("www.frontiersin.org/articles/\\(?1:[0-9]+\\.[0-9]+/[^/]+\\)/")
	;; Nature, Scientific Reports, and others
	("www.nature.com/articles/\\(?1:[0-9a-z-]+\\)" "10.1038/"))
  "Patterns to recognize DOI's from the URL's of scientific journals.")

(defun ebib-biblio-interface-guess-doi (url)
  "Guess the DOI directly from the URL, saving us some time.

Examples:

(ebib-biblio-interface-guess-doi
  \"https://iopscience.iop.org/article/10.1209/0295-5075/125/30004/meta\")
  => \"https://dx.doi.org/10.1209/0295-5075/125/30004\"
(ebib-biblio-interface-guess-doi
  \"https://journals.aps.org/rmp/abstract/10.1103/RevModPhys.92.011003\")
  => \"https://dx.doi.org/10.1103/RevModPhys.92.011003\"
(ebib-biblio-interface-guess-doi
  \"https://www.nature.com/articles/d41586-020-00926-4\")
  => \"https://dx.doi.org/10.1038/d41586-020-00926-4\"
"
  (with-temp-buffer
	;; Sometimes, %2F appear in links instead of /, specially in
	;; Springer
	(insert (url-unhex-string url))
	(catch 'found
	  (dolist (record ebib-biblio-interface-url-patterns)
		(let ((pattern (concat "^https?://" (car record)))
			  (prefix (or (cadr record) "")))
		  (goto-char (point-min))
		  (if (re-search-forward pattern nil t)
			  (let ((doi (match-string-no-properties 1)))
				(when doi
				  (throw 'found (concat "https://dx.doi.org/" prefix doi))))))
	  nil))))

This tool is used above to parse HTML files, looking for sections where DOI's are found. It uses Emacs' powerful regexp engine

(defun ebib-biblio-interface-collect-regexps (regexp &optional count from-point)
  "Return a list with all matches of 'regexp', either from the
beginning or from the current position in the buffer. By default
it returns the whole match, but 'count' may be a number denoting
a parenthetical expression."
  (let ((output '())
		(position (point)))
	(unless from-point
	  (goto-char 0))
	(while (re-search-forward regexp nil t)
	  (push (match-string-no-properties (or count 1)) output))
	(goto-char position)
	output))

Edit one or more Bibtex entries and add them to ebib

This is a minor mode that we use to edit the Bibtex entry that has been selected by the user. Pressing C-c C-c in this minor mode saves the Bibtex entry into the database, while pressing C-c C-k aborts the edition and deletes the record.

(defvar ebib-capture-mode-map
  (let ((map (make-sparse-keymap)))
	(define-key map "\C-c\C-c" #'ebib-capture-finalize)
	(define-key map "\C-c\C-k" #'ebib-capture-kill)
	map)
  "Keymap for `ebib-capture-mode', a minor mode.
  Use this map to set additional keybindings for when Org mode is used
  for a capture buffer.")

(defvar ebib-capture-mode-hook nil
  "Hook for the `ebib-capture-mode' minor mode.")

(define-minor-mode ebib-capture-mode
  "Minor mode for special key bindings in a capture buffer.

  Turning on this mode runs the normal hook `ebib-capture-mode-hook'."
  nil " Cap" ebib-capture-mode-map
  (setq-local
   header-line-format
   (substitute-command-keys
	"\\<ebib-capture-mode-map>Capture buffer.  Finish \
  `\\[ebib-capture-finalize]', abort `\\[ebib-capture-kill]'.")))

(defun ebib-capture-kill ()
  "Abort the current capture process."
  (interactive)
  ;; FIXME: This does not do the right thing, we need to remove the
  ;; new stuff by hand it is easy: undo, then kill the buffer
  (quit-window t))

(defun ebib-capture-finalize (&optional prefix)
  "Save entries and finalize. Use C-u as prefix to also switch to ebib."
  (interactive "P")
  (ebib--execute-when
	((or slave-db filtered-db)
	 (error "[Ebib] Cannot merge into a filtered or a slave database"))
	(real-db
	 (let* ((ebib--cur-db (ebib-select-database))
			(keys (ebib-capture-consistent-buffer)))
	   (when keys
		 (let ((result (ebib--bib-find-bibtex-entries ebib--cur-db nil)))
		   (ebib--log 'message "%d entries, %d @Strings and %s @Preamble found in file."
					  (car result)
					  (cadr result)
					  (if (nth 2 result) "a" "no")))
		 (ebib--set-modified t ebib--cur-db)
		 (ebib-capture-kill)
		 (ebib--update-buffers)
		 (when prefix
		   (ebib)
		   (ebib--goto-entry-in-index (car keys))
		   (ebib--update-buffers)))))
	(default (beep))))

(defun ebib-select-database ()
  (let ((l (length ebib--databases)))
	(if (<= l 1)
		ebib--cur-db
	  (let* ((pairs (mapcar (lambda (db) (cons (file-name-base (cdr (assq 'filename db))) db))
							ebib--databases))
			 (cur-db-name (file-name-base (cdr (assq 'filename ebib--cur-db))))
			 (names (mapcar 'car pairs))
			 (choice (completing-read "Ebib database: " names nil t cur-db-name
									  nil cur-db-name)))
		(cdr (assoc choice pairs))))))

(defun ebib-capture-consistent-buffer ()
  (let ((db (ebib-db-new-database))
		(duplicates '())
		(originals '()))
	(let ((result (ebib--bib-find-bibtex-entries db nil)))
	  (if (zerop (car result))
		  (message "No entries found")
		(maphash
		 (lambda (key value)
		   (if (ebib-db-get-entry key ebib--cur-db t)
			   (push key duplicates)
			 (push key originals)))
		 (ebib-db-val 'entries db))
		(when duplicates
		  (message "Found duplicate keys: %S" duplicates)
		  (goto-char 0)
		  (search-forward (car duplicates))
		  (setq originals nil))
		originals))))

(defun ebib-capture-raw-bibtex (entry)
  (require 'ebib)
  (unless ebib--cur-db
	(ebib))
  (with-current-buffer (get-buffer-create "*Biblio entry*")
	(erase-buffer)
	(insert entry)
	(bibtex-mode)
	(bibtex-set-dialect "BibTeX")
	(ebib-capture-mode)
	(goto-char 0)
	(search-forward "{" nil t)
	(pop-to-buffer (current-buffer))
	(current-buffer)))

(defun ebib-capture-selection-entry (start end)
  (interactive "r")
  (ebib-capture-raw-bibtex (buffer-substring start end)))

(when nil
(ebib-capture-raw-bibtex "@Article{Garc_a_Ripoll_1999,
  author       = {García-Ripoll, Juan J. and Pérez-García, Víctor
				  M. and Torres, Pedro},
  title        = {Extended Parametric Resonances in Nonlinear
				  Schrödinger Systems},
  year         = 1999,
  volume       = 83,
  number       = 9,
  month        = {Aug},
  pages        = {1715–1718},
  issn         = {1079-7114},
  doi          = {10.1103/physrevlett.83.1715},
  url          = {http://dx.doi.org/10.1103/physrevlett.83.1715},
  journal      = {Physical Review Letters},
  publisher    = {American Physical Society (APS)}
}"))

Other goodies

Browse current Ebib entry in Google Scholar. This is useful for seeing what cites this work.

(define-key ebib-index-mode-map (kbd "C-x g") 'ebib-browse-google-scholar)

(defun ebib-browse-google-scholar ()
  "Search this entry in Google Scholar. If the DOI is known,
use it as match. Otherwise try the title."
  (interactive)
  (ebib--execute-when
	(entries
	 (let ((key (ebib--get-key-at-point))
		   (value nil))
	   (if (setq value (ebib-get-field-value ebib-doi-field key ebib--cur-db 'noerror 'unbraced 'xref))
		   (with-temp-buffer
			 (insert value)
			 (goto-char (point-min))
			 (while (re-search-forward "\\([ \n\t]+\\|https?://\\(dx.\\)doi.org/\\)" nil t)
			   (replace-match ""))
			 (setq value (concat "doi:" (buffer-substring-no-properties (point-min) (point-max)))))
		 (setq value (concat "title:\"" (ebib-get-field-value "title" key ebib--cur-db 'noerror 'ubraced 'xref) "\"")))
	   (ebib--call-browser (concat "https://scholar.google.com/scholar?q="
								   (url-hexify-string value)))))
	(default
	  (beep))))

Closing

(provide 'ebib-biblio-interface)

Configuration