## Metadata
Author:: Randy Au
Title:: It's OK to Use Spreadsheets in Data Science – Towards Data Science
Source URL:: https://towardsdatascience.com/its-ok-to-use-spreadsheets-in-data-science-c1d0eff95b8b

## Highlights
- But it's probably the greatest Swiss army chainsaw for data for the sorts of ugly work that no one ever wants to admit they have to do every day. In an ideal world they wouldn't be necessary, but when there's a combination of tech debt, time pressure, poor data quality, and stakeholders who don't know anything but spreadsheets, they're invaluable.

- There's even a whole "European Spreadsheet Risks Interest Group (EuSpRiG)" (founded in 1999!) that's dedicated to Spreadsheet Risk Management a.k.a. how not to ruin your business via spreadsheet snafu.

- The majority of other issues is when people attempt to make a spreadsheet do too much, like becoming a database, data warehouse, project management tool, when more powerful and user-friendly dedicated solutions exist.

- The only real way to get a good sense of the data is to look at distributions, visualizations, and directly sampling it in raw form. Spreadsheets are generally great for this. I tend to find it less clunky than using pandas to poke around at arbitrary chunks of rows.

- The trick to know when to stop is if you're seriously considering writing a macro or something, stop.

- Many times, there's no other way to deal with data sets like the above other than writing some kind of brittle hard-coded mapping function of some kind. It's honestly a challenge to keep everything consistent and documented over years of production, the mix of camelCase and underscores points to that. Doing a simple aggregation for meaningful analysis is an utter pain in the butt.

- There's a reason why lots of BI tools of all levels have a kind of "export to CSV/Excel" feature. Lots of very smart analytic people don't know much about coding in Python or R.

- So why not have just a CSV, the universal data transfer format? You can, but it makes leaving a data source trail more work. You can package all the relevant information needed to pull a data set into a tab in the spreadsheet, whether it's relevant queries, links to scripts, whatever.