Sometimes I spend significant time in R or Python trying to do
something which is trivial is bash. This is especially useful when I’m
working with very large files that will take a long time to read
in. Why read in an entire file to get the last line, when I could just
tail -n 1? Or if I want the line count, why read it in when
-l will get the job done faster?
It turns out that it’s not too complicated to capture shell output in R or Python. Here’s how I do it.
If you use Python 3, capturing shell output is pretty simple (if
you’re still on Python 2, the tides are turning! It’s time to make the
change!). You can use the
subprocess module to get the output in
bytes, then decode and parse it.
import subprocess ## Get the last line of the file 'fname' last_line = subprocess.check_output("tail -n 1 " + fname, shell = True) ## convert to string and parse ## 'UTF-8' is a common encoding, but you may need to use something else last_line = last_line.decode('UTF-8').strip()
R makes this process easy too. You may have used
system() before to
submit shell commands. It turns out that if you set the argument
intern = TRUE, you’ll get the output as a character vector– you
don’t even have to deal with encoding! The output may take some
parsing, but the
stringr package is good for that.
require(stringr) ## Get the last line of the file 'fname' lastLine = system(stringr::str_c("tail -n 1 ", fname), intern = TRUE) ## strip leading/trailing whitespace lastLine = stringr::str_trim(lastLine)
This has saved me from reinventing the wheel many times since I learned it. Hopefully it helps you too!