Sunday, June 01, 2014

A typical way to use Emacs is to open lots of files as buffers. In the default Emacs setup, the command to switch to a file depends on whether the file is open:

  File Non-file
Opened C-x b
Not opened C-x f N/A

I don't want to have to remember whether a file is open. Instead, I want the command to depend on whether I'm switching to a file or a non-file. I also want to find files without having to switch folders first, because I work on lots of small projects in different folders. I use these bindings:

  File Non-file
Opened Cmd T C-x b
Not opened N/A

In a previous post, I described my previous attempt, using helm-for-files to open files from many different directories, using locate (mdfind on Mac). As part of that, I improved my Mac OS mdfind setup to include all my text files. I was hoping that I could make mdfind fast and precise enough that I'd use it all the time. Unfortunately I couldn't get the queries to run faster than 300 milliseconds, and it didn't feel fast enough to run on every keystroke. There are other emacs packages to do this but I ended up with my own custom setup to make this:

I took inventory of the files I wanted to open: they have to be text, they are in my home directory, and they are not in one of a set of subdirectories (such as .git or ~/Library/). I have only a few thousand of these. So why not just load them all into emacs? It'll make the search-as-I-type really fast.

After playing around with various options, I ended up with the idea of extending the notion of "recent files" to include all of the ones I might open. This will cover most everything. I can then fill in the rest with locate/mdfind. To do this, I dynamically bind recentf-list before calling helm-for-files.

This setup is specific to my system; I hope it gives you ideas for making your own.

The initial setup has an Emacs component and a shell component.

(defvar amitp/global-file-list '() "List of most files I work with")

(defun amitp/helm-for-files ()
  "Global filename match, over all files I typically open"
  (interactive)
  (let ((helm-ff-transformer-show-only-basename nil)
        (recentf-list (mapcar 'abbreviate-file-name amitp/global-file-list)))
  (helm-other-buffer
   '(helm-source-recentf)
   "*helm for files*")))

(define-key global-map [(hyper t)] 'amitp/helm-for-files)

To fill amitp/global-file-list I run a shell script in cron:

# build-filename-list.sh
mdfind -onlyin $HOME "kMDItemContentTypeTree = 'public.text' || kMDItemFSName = 'Makefile'" \
   >~/.global-file-list.txt

(and also add other filename extensions that aren't classified as 'public.text') Then I read it into Emacs:

(defun read-file-into-lines (filename)
  "Read file, split into lines, return a list"
  (with-temp-buffer
    (insert-file-contents filename)
    (split-string (buffer-substring-no-properties (point-min) (point-max)) "\n" t)))

(setq amitp/global-file-list (read-file-into-lines "~/.global-file-list.txt"))

This is fast. And it does what I want. But the order of results isn't ideal. If I could sort them better, the most likely results would show up first, and I'd have to type less to get to the file I want. And I also wanted to make sure I was able to find files that weren't captured in the precomputed list.

I decided to sort in this way:

  1. open buffers first
  2. current directory's files
  3. recently opened files
  4. the global file list
  5. anything mdfind can come up with (helm-source-locate)
  6. any other file system path (helm-source-find-files)
(defun amitp/helm-for-files ()
  "Global filename match, over all files I typically open"
  (interactive)
  (let ((helm-ff-transformer-show-only-basename nil)
        (recentf-list
         (append (amitp/buffer-file-names)
                 (helm-skip-boring-files (directory-files default-directory t))
                 recentf-list
                 amitp/global-file-list)))
    (helm-other-buffer
     '(helm-source-recentf helm-source-locate helm-source-find-files)
     "*helm for files*")))

(defun amitp/buffer-file-names ()
  "A list of filenames for the current buffers"
  (loop for filename in (mapcar 'buffer-file-name (buffer-list))
        when filename
        collect filename))

It was better, but not quite enough. The output of mdfind isn't sorted, but I can roughly sort it by putting it into groups:

  1. files I've modified in the past 1 day
  2. files I've modified in the past 2-4 days
  3. files I've modified in the past 5-14 days
  4. files I've modified in the past 15-45 days
  5. all other files
(mdfind -onlyin $HOME "kMDItemContentTypeTree = 'public.text' && kMDItemFSContentChangeDate > \$time.today(-1)";\
 mdfind -onlyin $HOME "kMDItemContentTypeTree = 'public.text' && kMDItemFSContentChangeDate > \$time.today(-4) && kMDItemFSContentChangeDate <= \$time.today(-1)";\
 mdfind -onlyin $HOME "kMDItemContentTypeTree = 'public.text' && kMDItemFSContentChangeDate > \$time.today(-14) && kMDItemFSContentChangeDate <= \$time.today(-4)";\
 mdfind -onlyin $HOME "kMDItemContentTypeTree = 'public.text' && kMDItemFSContentChangeDate > \$time.today(-45) && kMDItemFSContentChangeDate <= \$time.today(-14)";\
 mdfind -onlyin $HOME "kMDItemContentTypeTree = 'public.text' && kMDItemFSContentChangeDate <= \$time.today(-45)"\
) >~/.global-file-list.txt

(the real script also includes all the other filename extensions that aren't classified as public.text) Definitely better. Combined with recentf, this means that the "working set" of recent files is rather short, and I don't have to type much to get to the file I want.

I've needed other heuristics as well (some posted in comments), and I continue to refine those. I can now use Cmd-T somewhat like I do in a browser, where I open a new tab and then start typing in a url or query. Another cute trick: in browsers, Cmd-Shift-T opens a recently closed tab; I do something similar in Emacs:

(define-key global-map [(hyper shift t)] 'helm-recentf)

Overall, I'm reasonably happy with this system, but I'm still looking for ways to make it better. Check out different ways to find files globally and see if anything fits your own needs.

Update: [2014-07-13] I wanted it to run even faster. I now start with helm-source-recentf, which is fast, and then add helm-source-locate and helm-source-find-files only on a keypress. One of the things I realized is that helm-recentf lets me type words out of order, whereas helm-locate does not, so it's useful to keep the two separate.

(defun amitp/helm-for-files ()
  "Global filename match, over all files I typically open"
  (interactive)
  (let ((helm-ff-transformer-show-only-basename nil)
        (recentf-list
         (mapcar 'abbreviate-file-name
                 (append (amitp/buffer-file-names)
                         (helm-skip-boring-files (directory-files default-directory t))
                         recentf-list
                         amitp/global-file-list))))
    (helm
     :sources '(helm-source-recentf)
     :buffer "*helm for files*")))

(defun amitp/helm-locate (candidate)
  "Fallback when helm-recentf doesn't find what I want"
  (interactive)
  (helm :sources '(helm-source-locate helm-source-find-files)
        :buffer "*helm locate*"
        :input helm-input
        :resume 'noresume))

(defun amitp/helm-for-files-fallback ()
  (interactive)
  (helm-quit-and-execute-action 'amitp/helm-locate))

(bind-key "C-l" 'amitp/helm-for-files-fallback helm-generic-files-map)

Update: [2014-09-29] It'd probably be cleaner to use helm-filesets but I haven't yet changed my configuration to use that.

Labels:

6 comments:

Amit wrote at Sunday, June 1, 2014 at 10:29:00 PM PDT

There are lots of other details I've left out. There are some text files that Mac OS doesn't recognize as text, so I've made my cron job pick those up as well. I sorted buffers by project, with my own project system that I use with tabbar. Helm-recentf automatically removes duplicates once you start typing. I do some renaming of symlinked files that I normally keep in ~/Dropbox but I want emacs to see them in another location. I have the shell script exclude a whole bunch of directories that I don't care about.

Anonymous wrote at Monday, June 2, 2014 at 12:43:00 AM PDT

For sorting you may want to try enabling adaptive sorting in helm. It stores how often you select certain items and the more often you select a file the higher it will be in the candidate list.

Anonymous wrote at Monday, June 2, 2014 at 4:58:00 AM PDT

" Unfortunately I couldn't get the queries to run faster than 300 milliseconds, and it didn't feel fast enough to run on every keystroke. "

That's why helm has idle delay. There is no point in running the search on every keystroke if you keep typing, only if you stop for a while which delay is configurable.

Amit wrote at Monday, June 2, 2014 at 7:24:00 AM PDT

Anonymous 1: thanks! I'll take a look at adaptive sorting.

Anonymous 2: yes, exactly. Since I'm using this to replace switch-to-buffer, I want to just type something and press return without idling for the results to pop up. Preloading the results of locate/mdfind into a local variable eliminates the delay.

Anonymous wrote at Monday, June 2, 2014 at 9:59:00 AM PDT

This is clever. I like this idea.

Btw, you can replace amitp/shorten-file-name with the more general abbreviate-file-name.

Amit wrote at Monday, June 2, 2014 at 10:10:00 AM PDT

Thanks Anonymous 3! I've made the change, both in my emacs setup and in the blog post. I kept thinking there should be such a function but I couldn't find it, so I ended up writing my own. :-)