Recently, there has been much interest in the question of whether deep natural language understanding models exhibit systematicity; generalizing such that units like words make consistent contributions to the meaning of the sentences in which they appear. There is accumulating evidence that neural models often generalize non-systematically. We examined the notion of systematicity from a linguistic perspective, defining a set of probes and a set of metrics to measure systematic behaviour. We also identified ways in which network architectures can generalize non-systematically, and discuss why such forms of generalization may be unsatisfying. As a case study, we performed a series of experiments in the setting of natural language inference (NLI), demonstrating that some NLU systems achieve high overall performance despite being non-systematic.