PyData Global 2021 slides

Oct 28, 2021

I gave a talk today at PyData Global about how Git works under the hood. If you want to think through how to build your own version control system, and learn more about how Git stores things and why, check out my slides here!

More

ODSC West 2020 slides

Oct 28, 2020

Earlier today, I gave a talk on Business Skills for Data Science at ODSC West (virtual). There are lots of talks about technical skills for data scientists (I’ve given them myself), but I think business skills are equally important in doing impactful work. Check out my slides here:

More

PyCon 2019 slides

May 7, 2019

I went to my third PyCon last weekend, and gave a talk about pre-mortems and post-mortems, and how teams can use them to learn from failure. As promised in that talk, I’m making my slides available for anyone who wants to reference them. Those slides are here:

More

Basic Tools for Tuning Heuristic Optimizers

Oct 18, 2016

Note: This post is a follow-up to another post I wrote, which is a more general introduction to heuristic optimization algorithms. I recommend you read my earlier post before this one.

I wrote this post with helpful input from the awesome ladies at a Chicago Write/Speak/Code meetup! Kara Carrell wrote a great post on The Four A’s, and I’ll link to what others wrote as it comes online.

More

My Talk at PyData Chicago 2016

Aug 29, 2016

Last weekend I gave a talk at PyData Chicago! It was called “Evolutionary Algorithms: Perfecting the Art of ‘Good Enough’” (props to my thesis advisor, Stefano Allesina, for the catchy title). It was heavily based on my blog post and workshops I’ve given. Heuristic optimizers are a fun topic for me because they are so general and useful, but they’re not really a hot topic, and I think a lot of people have just never encountered them. They’re a great addition to a data scientist’s toolkit.

More

Collaborating with your future self using Markdown documents

Jun 3, 2016

The pipeline for an analysis project can get complicated and confusing, especially if you’re simulating your own data. I often create pipelines with several different scripts in different languages, but it’s easy to forget a step. But a couple of months ago, I wrote myself a little Markdown file that looks something like this*:

More

Capturing Shell Output in R and Python

Mar 31, 2016

Sometimes I spend significant time in R or Python trying to do something which is trivial is bash. This is especially useful when I’m working with very large files that will take a long time to read in. Why read in an entire file to get the last line, when I could just use tail -n 1? Or if I want the line count, why read it in when wc -l will get the job done faster?

More

I Published a Thing in Code Words!

Mar 17, 2016

The Recurse Center puts out a quarterly publication called Code Words, which publishes articles that try to capture the fun of digging into a problem and learning about programming. I wrote a piece on the grammar of graphics, and how it can provide a language for exploring and talking about data visualization. It uses examples from R’s ggplot package, but the ideas are more general. Check out my article, and other great articles from RC alums, in the sixth issue of Code Words!

More

What '.' Means in R, and Why it Matters

Dec 10, 2015

As far as I can tell, the R community has no generally-accepted style guide. Google and Hadley Wickham both have style guides, but across and even within CRAN packages, different naming and spacing conventions abound. You’re likely to find variables named in camelCase, snake_case, or, interestingly, dot.case. This last convention is unusual, because unlike many languages, R does not enforce specific syntactic meaning for dots. Dots can denote methods for S3 classes, but they don’t have to. This means that R only cares about dots sometimes, with confusing results.

More

Creating and populating a database using Python and SQLalchemy. Part 2: Classes and queries

Sep 8, 2015

Last month I wrote a post on the SQLalchemy engine and session. Now I’m going to describe how you can set up a mapping for your schema so that you can populate and query your database.

More

Open Files and inodes in Python and Beyond

Aug 31, 2015

Today I was trying to solve a SQLalchemy bug in Python when I discovered a strange behavior. I would set up logging to file, run some SQLalchemy code, and look at the log. Then I would open the file in emacs and clear out the log if there wasn’t anything interesting. But then, when I ran more SQLalchemy, the logging wouldn’t work anymore. Python didn’t throw an error, it just stopped logging altogether. Even if I tried to reset my logging by setting up a new session, I couldn’t get any more log statements without quitting and reopening ipython altogether.

More

Creating and populating a database using Python and SQLalchemy. Part 1: Communicating with your database

Aug 4, 2015

I’m spending the summer at the Recurse Center, where I’m working with a group of other awesome programmers to learn and self-study programming full-time for three months.

More

Heuristic optimization algorithms for fun and (academic) profit

Aug 4, 2015

Optimization algorithms are one of those things that you might learn about in an undergraduate CS class, then quickly forget. But if you need a good answer to an computationally intensive problem, there’s really no substitute for them. There are optimization algorithms with a strong mathematical basis (such as gradient descent), but these are generally based on certain assumptions of how the problem is defined and what your fitness landscape is like. Heuristic algorithms (such as hill climbers, simulated annealing, and evolutionary algorithms) make few assumptions but no guarantees. They are fairly agnostic to the shape and structure of your solution space and fitness function, but they make no promises that you will ever find the best solution.

More