Srini Kadamati

Apache Superset from Scratch: Day 1 (Python Setup)

December 23, 2021

I'm on a quest, to understand and map out as much of the Apache Superset code base as I can. In my day job, I have the opportunity to use Superset on a daily basis but I'm not intimately familiar with the code paths themselves. This series will revolve around the process on a M1 Macbook Air, but should generalize to most *nix systems.

My goal is to make noticeable progress on a daily basis. With the preamble out of the way, let's start!

Contributing.md

The Superset codebase is large; where does one even begin? For new code bases, I generally like alternating between:

For breadth, I'll start with the Setup Local Environment for Development section from CONTRIBUTING.MD.

Python 3.8

Python 3.7.x or 3.8.x are recommended for running the Superset backend. I'm on a Mac, and prefer to leave the default python that ships with the operating system to 2.7.x. Instead, I'll use Homebrew to install Python 3.8:

brew install python@3.8

Now, both the python3 and pip3 commands work as expected (independent of the python and pip commands)!

Virtualenv

Now time to create a Python virtual environment. Virtual environment is really a sandbox for your Python libraries that lives within a specific folder / project. This workflow gives you a few benefits:

Are there any downsides?

First, let me install virtualenv:

pip3 install virtualenv

Next, let's give our virtual environment a name. The virtualenv creates a folder within your project folder and stuffs all of the Python libraries you install there. So we're really trying to decide on the name of this folder.

The CONTRIBUTING.MD file in the Superset repo suggests naming it venv:

python3 -m venv venv

Why should we name it venv/? One hint is in the .gitignore file, which specifies files & folder paths to ignore in version control. This means that each user can have their own local state and those details won't get checked into version control.

The .gitignore file itself is version controlled though. So this file provides a "universal" agreemenet between all of the contributors to Superset that these files should not be checked into version control. Let's search for any string values containing "env" in the .gitignore:

cat .gitignore | grep 'env'

This returns:

.env
.envrc
env
venv*
env_py3
envpy3
env36
venv

While some open source projects use the .venv/ convention for virtualenv, the Superset one uses venv it seems. So this means:

Let's stick to the community convention, and run the suggested command:

python3 -m venv venv

If we run ls while within the superset/ folder, we'll see venv listed as a folder. Success!

Python Dependencies

Usually, the Python requirements are specified in a requirements.txt file. In the case of Superset, we're blessed with a folder of .in and .txt files. There's a lot we could explore and unpack here, but I'm going to focus on getting everything setup first.

If we look to CONTRIBUTING.MD, we see:

pip install -r requirements/testing.txt

If we open that file, we see something that resembles a standard requirements.txt file, but with this header:

# This file is autogenerated by pip-compile-multi

I've made a mental note to investigate & explore pip-compile-multi later, a library for compiling multiple requirement files. For now, let's run the following command to install the dependencies:

pip3 install -r requirements/testing.txt

Error 1: MySQL

I ran into this issue with red scary error text while on my M1 Macbook computer:

Collecting mysqlclient==2.1.0
  Using cached mysqlclient-2.1.0.tar.gz (87 kB)
    ERROR: Command errored out with exit status 1:
     command: /opt/homebrew/opt/python@3.8/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/6d/f0fzvlyn6sd58q5rmx6s6df00000gn/T/pip-install-6c548wua/mysqlclient_a8c054d3233d4d00acb42d6a6bf2a562/setup.py'"'"'; __file__='"'"'/private/var/folders/6d/f0fzvlyn6sd58q5rmx6s6df00000gn/T/pip-install-6c548wua/mysqlclient_a8c054d3233d4d00acb42d6a6bf2a562/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/6d/f0fzvlyn6sd58q5rmx6s6df00000gn/T/pip-pip-egg-info-0735tk4h

     WARNING: Discarding https://files.pythonhosted.org/packages/de/79/d02be3cb942afda6c99ca207858847572e38146eb73a7c4bfe3bdf154626/mysqlclient-2.1.0.tar.gz#sha256=973235686f1b720536d417bf0a0d39b4ab3d5086b2b6ad5e6752393428c02b12 (from https://pypi.org/simple/mysqlclient/) (requires-python:>=3.5). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    ERROR: Could not find a version that satisfies the requirement mysqlclient==2.1.0 (from versions: 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.8, 1.3.9, 1.3.10, 1.3.11rc1, 1.3.11, 1.3.12, 1.3.13, 1.3.14, 1.4.0rc1, 1.4.0rc2, 1.4.0rc3, 1.4.0, 1.4.1, 1.4.2, 1.4.2.post1, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.1.0rc1, 2.1.0)
    ERROR: No matching distribution found for mysqlclient==2.1.0

Some StackOverflow sleuthing suggested that I needed to install MySQL server via homebrew so the installation process for the Python client library would work. So this may not be an M1 related issue after all:

brew install mysql

Error 2: Postgres

While mysql-client succeeded, pip now got stuck on postgres:

Error: pg_config executable not found.

pg_config is required to build psycopg2 from source.  Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:

    python setup.py build_ext --pg-config /path/to/pg_config build ...

or with the pg_config option in 'setup.cfg'.

If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.

Let's check out Stack Overflow again. I like using the Postgres Mac app, which contains a pg_config executable. So I'm going to

I'm going to move forward with finding the path to the pg_config file and add it to my PATH. I'll first crack open the Postgres.app folder:

Opening Postgres.app Folder

After jumping through folders, I found the pg_config executable. As suggested in StackOverflow, I'm going to add that executable's folder to my PATH:

export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/14/bin

Now when I pip3 install -r requirements/testing.txt again, everything works beautifully!

Editable Superset

Now, we're ready to install Superset in "editable" mode. Editable mode lets us modify and test code changes in Superset quickly, which is ideal when developing features or fixing bugs.

pip3 install -e .

To test the installation, run the superset command and the Superset CLI should appear:

Superset CLI

Next Up

That's it for Day 1. In Day 2, I'll play with setting up the metadata database, creating roles & permissions, loading example data, and starting the backend server.

If you want to follow along, use the RSS feed. Stay tuned! 📺