1 min read

My Personal ML / AI / Data Stack

Referencing the excellent 2023 MAD (ML / AI / Data) Landscape put together every year by Matt Turck of FirstMark Capital, both the number of categories and solutions within each category provide an almost overwhelming number of options a developer can choose.

While reading through these categories, I realized it would help me remember which ones I like the most and also which ones I wanted to go back and research further, so I've made a list.

I don't claim to use anywhere near all of these all the time but thought it would be useful (at least for myself) to note which tools / categories I find myself reaching for the most, and why.

This list will definitely change over time, which is another reason to document what I'm doing today so I can easily go back and compare down the road.

Primary categories:

Storage

  • Backblaze
  • Cloudflare

Data Warehouses

  • Google BigQuery

Analytics

  • Looker

Visualization

  • Plotly
  • Streamlit

Data Science Notebooks

  • Jupyter
  • Google Colab

Data Science Platforms

  • Anaconda

NoSQL Databases

  • Redis

Automation & Operations

  • Zapier

Data Analyst Platforms

  • Airtable

Applications - Horizontal

  • Github CoPilot

NLP

  • HuggingFace
  • Google Cloud Natural Language API
  • Lots of open-source and homegrown

Product Analytics

  • Google Analytics
  • Plus lots of others not listed here

ELT / ETL / Data Transformation

  • Airbyte
  • Airflow

Vector Databases

  • ChromaDB
  • Pinecone (if absolutely necessary)

Horizontal AI / AGI

  • OpenAI

Closed Source Models

  • OpenAI GPT-3.5, GPT-4
  • Google Bard

Databases

  • Postgres
  • Redis

OLAP

  • DuckDB

Streaming / Messaging

  • RabbitMQ (I'm a total noob but we use it a lot at DemandSphere)

Stat Tools & Languages

  • Python
  • Pandas
  • NumPy
  • Dask (not listed)

AI Frameworks & Libraries

  • Tensorflow
  • PyTorch
  • Keras

This is obviously a lot and, as mentioned, I don't use all of these all the time.

These are just the tools I have some level of familiarity with and find myself turning to vs. others listed here.

I generally favor tools that have at least an open source basis or are mainstream enough (such as Google's tools) that I'm likely to be able to integrate with them easily on client engagements.