#status/processed

# Metadata
Author:: [[Randy Au]]
Title:: It’s OK to Use Spreadsheets in Data Science – Towards Data Science
Full Title:: It’s OK to Use Spreadsheets in Data Science – Towards Data Science
Import Date:: 2023-05-13
Source:: #source/readwise/instapaper
Source URL:: [Source URL](https://towardsdatascience.com/its-ok-to-use-spreadsheets-in-data-science-c1d0eff95b8b)
Review URL:: [Review URL](https://readwise.io/bookreview/26338667)
# Document
Tags:: [[Data Science]] [[Spreadsheets]]
# Highlights
- But it’s probably the greatest Swiss army chainsaw for data for the sorts of ugly work that no one ever wants to admit they have to do every day. ==In an ideal world they wouldn’t be necessary, but when there’s a combination of tech debt, time pressure, poor data quality, and stakeholders who don’t know anything but spreadsheets, they’re invaluable.==
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484473)
- There’s even a whole “European Spreadsheet Risks Interest Group (EuSpRiG)” (founded in 1999!) that’s dedicated to Spreadsheet Risk Management a.k.a. how not to ruin your business via spreadsheet snafu.
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484475)
- Note: Amazing
- The majority of other issues is when people attempt to make a spreadsheet do too much, like becoming a database, data warehouse, project management tool, when more powerful and user-friendly dedicated solutions exist.
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484481)
- The only real way to get a good sense of the data is to look at distributions, visualizations, and directly sampling it in raw form. Spreadsheets are generally great for this. I tend to find it less clunky than using pandas to poke around at arbitrary chunks of rows.
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484487)
- ==The trick to know when to stop is if you’re seriously considering writing a macro or something, stop.==
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484499)
- Many times, ==there’s no other way to deal with data sets like the above other than writing some kind of brittle hard-coded mapping function of some kind==. It’s honestly a challenge to keep everything consistent and documented over years of production, the mix of camelCase and underscores points to that. Doing a simple aggregation for meaningful analysis is an utter pain in the butt.
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484500)
- There’s a reason why lots of BI tools of all levels have a kind of “export to CSV/Excel” feature. Lots of very smart analytic people don’t know much about coding in Python or R.
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484502)
- So why not have just a CSV, the universal data transfer format? You can, but it makes leaving a data source trail more work. You can package all the relevant information needed to pull a data set into a tab in the spreadsheet, whether it’s relevant queries, links to scripts, whatever.
- Date:: [[2019-03-30]]
- Find: [View Highlight](https://instapaper.com/read/1177860845/10484503)