Structural Equation Modelling – an underused form of multivariate analysis



One of the challenges in data analysis is to get a sense of storyline. My distaste for crosstabs (yeah, sure they can tell us stuff) comes from their fragmentary style of storytelling.  If this was a crime scene, then crosstabs would give you a sliver of glass here, a trace of gunpowder over there, as well as a few fingerprints which may, or may not be connected to the main story.  The evidence seldom hangs together.

Years ago I was thinking how cool it would be if we could somehow construct a flowchart showing how A causes B, drives C etcetera – when a colleague tapped me on the shoulder and said something like “Duh…you haven’t used structural equation models?”

So ten years ago I added AMOS to my little library of SPSS products and wow, what a useful implement.  The classic case study they always use to illustrate AMOS is how Education (which school you went to) and Income (how rich your family is) help predict those SAT scores that make or break your chances of getting into Harvard.

This is the data they use on the demonstration video:

As the video demonstrates, a flowchart style model is very quick to put together, and even a simple model shows the relative importance of the drivers. In this case, being rich doesn’t give you good grades…but it does help you get into a good school which DOES give you good grades.  The other good thing about SEMs is that they need to factor in all those variables that we haven’t asked about in the survey.  For example in staff satisfaction surveys around about half of what causes staff engagement is nothing to do with pay, or training, or leadership style, or the perks of the job – nice though these may be. Something like 50% of the story is due to the attitude of the staff member – whether they’re fundamentally engaged as an individual, or jaded and indifferent to world around them. So in a staff survey SEM we might create an “unobserved variable” which helps us get a measure of these exogenous forces. I like it because we often assume, otherwise, that only the things we measure are driving the outcomes.  That somehow 100% of the SAT story is determined by Income and Education alone. Never mind that little Johnny at Auckland Grammar is still a lazy sod.

The result of an SEM is an elegant model that quickly tells us what the drivers are.  I recall doing one for a Government department (customer satisfaction) and the survey asked dozens of questions about demographics to deliverables such as promptness of answering the phone, the demeanor of the staff etc etc etc.  Well when we put it together in one model, most variables didn’t matter a toss. The one that really mattered – and explained 70% of the story was this:  When you phoned up to get an answer to a problem…did you get a helpful answer?  There were dozens of deliverables but these were all peripheral to the main desire of their customers: they just wanted answers. 

SEM’s deliver a holistic picture in a format that non-statisticians easily get. They’re more useful in describing what’s going on – more so than serving as an explorational tool.  And they’re easy to generate, and deliver diagnostic statistics so you can see if your models are significant and/or meaningful. My question is – why do I only see these used in academia?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s