Srini Kadamati/2021-12-30T10:20:00-05:00Apache Superset from Scratch: Day 7 (Metadata for Examples)2021-12-30T10:20:00-05:002021-12-30T10:20:00-05:00Srini Kadamatitag:None,2021-12-30:/apache-superset-from-scratch-day-7-metadata-for-examples.html<p>I ended Day 6 with a good understanding of the first half of the <code>load_world_bank_health_n_pop()</code> function, which loads the World Health dashboard example. Today, I'm hoping to understand the rest of the function if possible.</p>
<p>The next line of code is:</p>
<div class="highlight"><pre><span></span><code><span class="k">table</span> <span class="o">=</span> <span class="n">get_table_connector …</span></code></pre></div><p>I ended Day 6 with a good understanding of the first half of the <code>load_world_bank_health_n_pop()</code> function, which loads the World Health dashboard example. Today, I'm hoping to understand the rest of the function if possible.</p>
<p>The next line of code is:</p>
<div class="highlight"><pre><span></span><code><span class="k">table</span> <span class="o">=</span> <span class="n">get_table_connector_registry</span><span class="p">()</span>
</code></pre></div>
<p>The <code>get_table_connector_registry()</code> function seems to be defined in <code>superset/superset/examples/helpers.py</code>. The function definition is very simple:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">get_table_connector_registry</span><span class="p">()</span> <span class="o">-></span> <span class="nl">Any:</span>
<span class="k">return</span> <span class="n">ConnectorRegistry</span><span class="p">.</span><span class="n">sources</span><span class="p">[</span><span class="s">"table"</span><span class="p">]</span>
</code></pre></div>
<h3>Helper Functions for Superset Examples</h3>
<p>This function is so simple, I'm even wondering why this function needs to exist. </p>
<blockquote>
<p>Why can't <code>ConnectorRegistry.sources["table"]</code> just be defined directly in the <code>load_world_bank_health_n_pop()</code> function within <code>world_bank.py</code>?</p>
</blockquote>
<p>Let me take a peek at the rest of the functions in <code>superset/examples/helpers.py</code> to try to gain more context:</p>
<ul>
<li><code>get_table_connector_registry()</code>: yet to be determined!</li>
<li><code>get_examples_folder()</code>: returns exact path to where example datasets, dashboards, etc are stored.</li>
<li><code>update_slice_ids(layout_dict: Dict[Any, Any], slices: List[Slice])</code>: does some type of sorting of slices</li>
<li><code>merge_slice(slc: Slice)</code>: deletes existing Slice and creates new one in its place</li>
<li><code>get_slice_json(defaults: Dict[Any, Any], **kwargs: Any)</code>: unclear! But something around JSON representation of Slice objects.</li>
<li><code>get_example_data(filepath: str, is_gzip: bool = True, make_bytes: bool = False)</code>: common utility function for fetching JSON dataset from a URL and reading in the bytes (we saw this earlier!)</li>
</ul>
<p>These all smell like functions that any example-loading script can benefit from and re-use. So it makes sense that they're all defined here for common use.</p>
<h3>Why ConnectorRegistry?</h3>
<p>ConnectorRegistry sounds interesting, as it sounds like some type of registry that maintains the available database connectors in the current Superset installation. This lead me to the following question:</p>
<blockquote>
<p>Why not store the available connectors in the metadata database?</p>
</blockquote>
<p>My hunch is that this adds extra friction, slows down the development process, and doesn't add too much. </p>
<p>You can imagine a Superset contributor having multiple local git branches and wanting to quickly switch between them. In my past life as a backend engineer, I've personally experienced the pains of database state causing issues between versions of the same software.</p>
<p>For connectors, the <em>state</em> itself is likely defined entirely in the code itself. </p>
<ul>
<li>The connector libraries are either defined in code that lives in the source tree, or they aren't!</li>
<li>The <code>db_engine_spec</code> for a given database either exists in the source tree, or it doesn't!</li>
</ul>
<p>Let's move on to the implementation for ConnectorRegistry.</p>
<h3>Connector Registry</h3>
<p>Where is the ConnectorRegistry class defined? The comment at the top for the class definition is:</p>
<div class="highlight"><pre><span></span><code><span class="nv">Central</span> <span class="nv">Registry</span> <span class="k">for</span> <span class="nv">all</span> <span class="nv">available</span> <span class="nv">datasource</span> <span class="nv">engines</span>
</code></pre></div>
<p>In <code>superset/superset/connectors/connector_registry.py</code>, the ConnectorRegistry class is defined with the following class methods.</p>
<ul>
<li><code>register_sources()</code></li>
<li><code>get_datasource()</code></li>
<li><code>get_all_datasources()</code></li>
<li><code>get_datasource_by_id()</code></li>
<li><code>get_datasource_by_name()</code></li>
<li><code>query_datasources_by_permissions()</code></li>
<li><code>get_eager_datasource()</code></li>
<li><code>query_datasources_by_name()</code></li>
</ul>
<p>Because the ConnectorRegistry class acts as a source of truth, it can just have class methods that other parts of the codebase can call to look up information. It won't ever be instantiated into individual objects.</p>
<p>The <code>register_sources()</code> class method piqued my interest, as it probably <em>registers</em> new data sources. When is this actually called though? It's only called in <code>superset/superset/initialization/__init__.py</code>:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">configure_data_sources</span><span class="p">(</span><span class="n">self</span><span class="p">)</span> <span class="o">-></span> <span class="nl">None:</span>
<span class="p">#</span> <span class="n">Registering</span> <span class="n">sources</span>
<span class="n">module_datasource_map</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">config</span><span class="p">[</span><span class="s">"DEFAULT_MODULE_DS_MAP"</span><span class="p">]</span>
<span class="n">module_datasource_map</span><span class="p">.</span><span class="n">update</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">config</span><span class="p">[</span><span class="s">"ADDITIONAL_MODULE_DS_MAP"</span><span class="p">])</span>
<span class="n">ConnectorRegistry</span><span class="p">.</span><span class="n">register_sources</span><span class="p">(</span><span class="n">module_datasource_map</span><span class="p">)</span>
</code></pre></div>
<p>This makes sense. The possible data source engines only need to be registered during the flask app initialization.</p>
<p>What is the default value set to <code>"DEFAULT_MODULE_DS_MAP</code>?</p>
<div class="highlight"><pre><span></span><code>DEFAULT_MODULE_DS_MAP = OrderedDict(
[
("superset.connectors.sqla.models", ["SqlaTable"]),
("superset.connectors.druid.models", ["DruidDatasource"]),
]
</code></pre></div>
<p>What is the default value set to <code>"ADDITIONAL_MODULE_DS_MAP"</code>?</p>
<div class="highlight"><pre><span></span><code><span class="nl">ADDITIONAL_MODULE_DS_MAP</span><span class="p">:</span><span class="w"> </span><span class="n">Dict</span><span class="o">[</span><span class="n">str, List[str</span><span class="o">]</span><span class="err">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{}</span><span class="w"></span>
</code></pre></div>
<p>So this code path essentially returns an Ordered Dictionary of values. Interesting.</p>
<p>Let's circle back to <code>get_table_connector_registry()</code>, which essentially boils down to:</p>
<div class="highlight"><pre><span></span><code>ConnectorRegistry.sources["table"]
</code></pre></div>
<p>This code references the following class variable:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="n">ConnectorRegistry:</span>
<span class="s">"""Central Registry for all available datasource engines"""</span>
<span class="n">sources:</span> <span class="n">Dict</span>[<span class="n">str</span>, <span class="n">Type</span>[<span class="s">"BaseDatasource"</span>]] = {}
</code></pre></div>
<p>What does this value look like for our current Superset instance?</p>
<div class="highlight"><pre><span></span><code>>>> ConnectorRegistry.sources
{'table': <class 'superset.connectors.sqla.models.SqlaTable'>, 'druid': <class 'superset.connectors.druid.models.DruidDatasource'>}
</code></pre></div>
<p>From Superset's stand point, databases are either:</p>
<ul>
<li>a Druid data source, connected using the legacy Druid connector, in which case the datasets are JSON (I think)</li>
<li>a SQAlchemy data source, in which case the datasets are all tables</li>
</ul>
<p>The returned dictionary matches the default value set to <code>"DEFAULT_MODULE_DS_MAP</code>:</p>
<div class="highlight"><pre><span></span><code>DEFAULT_MODULE_DS_MAP = OrderedDict(
[
("superset.connectors.sqla.models", ["SqlaTable"]),
("superset.connectors.druid.models", ["DruidDatasource"]),
]
</code></pre></div>
<p>This <em>seems</em> like a LOT of steps and code just to return a tiny dictionary of values!</p>
<blockquote>
<p>ConnectorRegistry doesn't even seem to return the actual <em>database connectors</em> that are registered. This is a bit weird!</p>
</blockquote>
<p>Interestingly, the SqlaTable class does warrant further investigation. It seems to be an ORM model / wrapper for the SQLAlchemy table objects with some Superset-specific niceities.</p>
<h3>Searching for a Table</h3>
<p>After returning the SQLAlchemy compatible module name and class using the ConnectorRegistry, here's the next line of code:</p>
<div class="highlight"><pre><span></span><code>tbl = db.session.query(table).filter_by(table_name=tbl_name).first()
</code></pre></div>
<p>This code uses SQLAlchemy syntax to generate a SQL query that returns the "wb_health_population" table. It handles multiple results and focuses on just the ones from SQLAlchemy databases that Superset is aware of. Specifically, it returns a SqlaTable object and assigns to <code>tbl</code>.</p>
<p>The next code fragment creates the table if it doesn't exist:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="nv">not</span> <span class="nv">tbl</span>:
<span class="nv">tbl</span> <span class="o">=</span> <span class="nv">table</span><span class="ss">(</span><span class="nv">table_name</span><span class="o">=</span><span class="nv">tbl_name</span>, <span class="nv">schema</span><span class="o">=</span><span class="nv">schema</span><span class="ss">)</span>
</code></pre></div>
<p>Next up, this code reads in "countries.md", which contains information on the World Health dataset. Then, it attaches this information to the description column for the SqlaTable object:</p>
<div class="highlight"><pre><span></span><code>tbl.description = utils.readfile(
os.path.join(get_examples_folder(), "countries.md")
)
</code></pre></div>
<p>The next three lines of code set values to three more columns within this SqlaTable object:</p>
<div class="highlight"><pre><span></span><code>tbl.main_dttm_col = "year"
tbl.database = database
tbl.filter_select_enabled = True
</code></pre></div>
<h3>Defining Metrics for World Health Dashboard</h3>
<p>The raw metrics that need to be created to power this Superset dashboard are represented as the following strings:</p>
<div class="highlight"><pre><span></span><code>metrics = [
"sum__SP_POP_TOTL",
"sum__SH_DYN_AIDS",
"sum__SH_DYN_AIDS",
"sum__SP_RUR_TOTL_ZS",
"sum__SP_DYN_LE00_IN",
"sum__SP_RUR_TOTL",
]
</code></pre></div>
<p>The next block of code does the following for each metric-string:</p>
<ul>
<li>searches all of the <code>tbl</code> object's <code>metrics</code> to check if it already exists</li>
<li>slices the metric-string to extract first 3 characters (e.g. "sum")</li>
<li>uses SQLAlchemy to search for the column that will be aggregated</li>
<li>appends the metric to <code>tbl.metrics</code>, using the <code>SqlMetric</code> class</li>
</ul>
<div class="highlight"><pre><span></span><code><span class="k">for</span> <span class="n">metric</span> <span class="ow">in</span> <span class="n">metrics</span><span class="p">:</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">any</span><span class="p">(</span><span class="n">col</span><span class="o">.</span><span class="n">metric_name</span> <span class="o">==</span> <span class="n">metric</span> <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">tbl</span><span class="o">.</span><span class="n">metrics</span><span class="p">):</span>
<span class="n">aggr_func</span> <span class="o">=</span> <span class="n">metric</span><span class="p">[:</span><span class="mi">3</span><span class="p">]</span>
<span class="n">col</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">column</span><span class="p">(</span><span class="n">metric</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">db</span><span class="o">.</span><span class="n">engine</span><span class="p">))</span>
<span class="n">tbl</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
<span class="n">SqlMetric</span><span class="p">(</span><span class="n">metric_name</span><span class="o">=</span><span class="n">metric</span><span class="p">,</span> <span class="n">expression</span><span class="o">=</span><span class="n">f</span><span class="s2">"{aggr_func}({col})"</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>The next three lines seem to commit these changes to the database and re-fetch the metadata:</p>
<div class="highlight"><pre><span></span><code>db.session.merge(tbl)
db.session.commit()
tbl.fetch_metadata()
</code></pre></div>
<p>We're nearly done understanding the World Health dashboard example! But I've run out of time for today. I'll save this for Day 8!</p>Apache Superset from Scratch: Day 6 (Database Class)2021-12-29T10:20:00-05:002021-12-29T10:20:00-05:00Srini Kadamatitag:None,2021-12-29:/apache-superset-from-scratch-day-6-database-class.html<p>I ended Day 5 with the knowledge of the Superset shell and a hunch that it might be a better tool for understanding the different code paths for how an example is loaded.</p>
<p>Now I'm going to try running some commands to begin emulating what the app is doing when …</p><p>I ended Day 5 with the knowledge of the Superset shell and a hunch that it might be a better tool for understanding the different code paths for how an example is loaded.</p>
<p>Now I'm going to try running some commands to begin emulating what the app is doing when loading an example. First things first, let's run the <code>utils.get_example_database()</code> function call:</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">superset.utils</span> <span class="kn">import</span> <span class="n">core</span> <span class="k">as</span> <span class="n">utils</span>
<span class="o">>>></span> <span class="n">database</span> <span class="o">=</span> <span class="n">utils</span><span class="o">.</span><span class="n">get_example_database</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">database</span>
<span class="n">examples</span>
</code></pre></div>
<p>Interesting. Superset returns the string value "examples". This is likely just the string representation of the returned "Database" object. We know that the examples database in our Superset installation lives in my home directory, as a SQLite file. So running the next command within the <code>load_world_bank_health_n_pop()</code> function should give us that information:</p>
<div class="highlight"><pre><span></span><code>>>> engine = database.get_sqla_engine()
>>> engine
Engine(sqlite:////Users/srinik/.superset/superset.db)
</code></pre></div>
<p>Success! </p>
<h3>The Superset Database Class</h3>
<p>Next, I want to better understand the returned <code>database</code> object. The class for Database is defined in <code>superset/superset/models/core.py</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="n">Database</span>(
<span class="n">Model</span>, <span class="n">AuditMixinNullable</span>, <span class="n">ImportExportMixin</span>
): <span class="c1"># pylint: disable=too-many-public-methods</span>
<span class="s">"""An ORM object that stores Database related information"""</span>
<span class="n">__tablename__</span> = <span class="s">"dbs"</span>
<span class="nb">type</span> = <span class="s">"table"</span>
<span class="n">__table_args__</span> = (<span class="n">UniqueConstraint</span>(<span class="s">"database_name"</span>),)
<span class="nb">id</span> = <span class="n">Column</span>(<span class="n">Integer</span>, <span class="n">primary_key</span>=<span class="nb">True</span>)
<span class="n">verbose_name</span> = <span class="n">Column</span>(<span class="n">String</span>(<span class="mi">250</span>), <span class="nb">unique</span>=<span class="nb">True</span>)
<span class="c1"># short unique name, used in permissions</span>
<span class="n">database_name</span> = <span class="n">Column</span>(<span class="n">String</span>(<span class="mi">250</span>), <span class="nb">unique</span>=<span class="nb">True</span>, <span class="n">nullable</span>=<span class="nb">False</span>)
...
</code></pre></div>
<p>This is the same file that has the model definition for the CSSTemplate class, as I stumbled into earlier in this series! At the top of <code>core.py</code> is the following text:</p>
<blockquote>
<p>A collection of ORM sqlalchemy models for Superset</p>
</blockquote>
<p>This file contains the class definitions for the following models:</p>
<ul>
<li>Url</li>
<li>KeyValue</li>
<li>CssTemplate</li>
<li>ConfigurationMethod</li>
<li>Database</li>
<li>Log</li>
<li>FavStarClassName</li>
<li>FavStar</li>
</ul>
<p>Let's dive deeper into the Database class!</p>
<p><strong>Columns / Fields</strong></p>
<p>The Database class defined in <code>core.py</code> maps to the <strong>"dbs"</strong> table in the metadata database, as suggested by this line of code:</p>
<div class="highlight"><pre><span></span><code>__tablename__ = "dbs"
</code></pre></div>
<p>What other columns are defined?</p>
<ul>
<li><code>id</code>: integer, primary key</li>
<li><code>verbose_name</code>: string, to specify a more human-friendly name?</li>
<li><code>database_name</code>: string, name of the database</li>
<li><code>sqlalchemy_uri</code>: string, likely the URI sent to the underlying database driver to connect</li>
<li><code>password</code>: salted password</li>
<li><code>cache_timeout</code>: integer, corresponding to the cache timeout in seconds at the database level</li>
<li><code>select_as_create_table_as</code>: boolean, not sure what this does</li>
<li><code>expose_in_sqllab</code>: boolean, should this db be exposed in SQL Lab?</li>
<li><code>configuration_method</code>: string, type of form used to configure?</li>
<li>several <code>allow_</code> fields around async, file upload, CTAS, CVAS, DML, multi schema metadata fetch, and other user-configurable features</li>
<li>several <code>extra_</code> fields around encryption, fields, etc.</li>
<li><em>and more</em></li>
</ul>
<p><strong>String Representation</strong></p>
<p>We know from our earlier exploration that running <code>print()</code> on a Superset Database object returns the database name. This aligns with the <code>__repr__()</code> definition for this model!</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">__repr__</span><span class="p">(</span><span class="kr">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">str</span><span class="o">:</span>
<span class="kr">return</span> <span class="kr">self</span><span class="p">.</span><span class="n">name</span>
</code></pre></div>
<p><strong>Name Attribute</strong></p>
<p>If I call the <code>.name</code> attribute on a Database object, the following is evaluated:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">name</span><span class="p">(</span><span class="kr">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">str</span><span class="o">:</span>
<span class="kr">return</span> <span class="kr">self</span><span class="p">.</span><span class="n">verbose_name</span> <span class="nf">if</span> <span class="kr">self</span><span class="p">.</span><span class="n">verbose_name</span> <span class="n">else</span> <span class="kr">self</span><span class="p">.</span><span class="n">database_name</span>
</code></pre></div>
<p>Interesting -- now we know how <code>verbose_name</code> is used! It's the preference for showing to humans, and <code>database_name</code> is the backup value displayed.</p>
<p><strong>Data Attribute</strong></p>
<p>What's next? The <code>.data</code> attribute looks interesting:</p>
<div class="highlight"><pre><span></span><code><span class="nv">@property</span><span class="w"></span>
<span class="w"> </span><span class="n">def</span><span class="w"> </span><span class="k">data</span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">Dict</span><span class="o">[</span><span class="n">str, Any</span><span class="o">]</span><span class="err">:</span><span class="w"></span>
</code></pre></div>
<p>I want to run this for my Examples SQLite database and see what's returned:</p>
<div class="highlight"><pre><span></span><code>>>> database.data
{'id': 1,
'name': 'examples',
'backend': 'sqlite',
'configuration_method': 'sqlalchemy_form',
'allow_multi_schema_metadata_fetch': False,
'allows_subquery': True,
'allows_cost_estimate': False,
'allows_virtual_table_explore': True,
'explore_database_id': 1,
'parameters': {},
'parameters_schema': {}}
</code></pre></div>
<p><strong>Reserved Words</strong></p>
<p>Neat! I can also retrieve the reserved words for the database:</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">database</span><span class="o">.</span><span class="n">get_reserved_words</span><span class="p">()</span>
<span class="p">{</span><span class="s1">'right'</span><span class="p">,</span> <span class="s1">'select'</span><span class="p">,</span> <span class="s1">'check'</span><span class="p">,</span> <span class="s1">'having'</span><span class="p">,</span> <span class="s1">'virtual'</span><span class="p">,</span> <span class="s1">'before'</span><span class="p">,</span> <span class="s1">'fail'</span><span class="p">,</span> <span class="s1">'conflict'</span><span class="p">,</span> <span class="s1">'current_timestamp'</span><span class="p">,</span> <span class="s1">'escape'</span><span class="p">,</span> <span class="s1">'full'</span><span class="p">,</span> <span class="s1">'case'</span><span class="p">,</span> <span class="s1">'references'</span><span class="p">,</span> <span class="s1">'drop'</span><span class="p">,</span> <span class="s1">'begin'</span><span class="p">,</span> <span class="s1">'cast'</span><span class="p">,</span> <span class="s1">'view'</span><span class="p">,</span> <span class="s1">'of'</span><span class="p">,</span> <span class="s1">'insert'</span><span class="p">,</span> <span class="s1">'on'</span><span class="p">,</span> <span class="s1">'outer'</span><span class="p">,</span> <span class="s1">'cascade'</span><span class="p">,</span> <span class="s1">'in'</span><span class="p">,</span> <span class="s1">'attach'</span><span class="p">,</span> <span class="s1">'inner'</span><span class="p">,</span> <span class="s1">'vacuum'</span><span class="p">,</span> <span class="s1">'deferred'</span><span class="p">,</span> <span class="s1">'add'</span><span class="p">,</span> <span class="s1">'for'</span><span class="p">,</span> <span class="s1">'temporary'</span><span class="p">,</span> <span class="s1">'union'</span><span class="p">,</span> <span class="s1">'update'</span><span class="p">,</span> <span class="s1">'offset'</span><span class="p">,</span> <span class="s1">'as'</span><span class="p">,</span> <span class="s1">'where'</span><span class="p">,</span> <span class="s1">'transaction'</span><span class="p">,</span> <span class="s1">'explain'</span><span class="p">,</span> <span class="s1">'indexed'</span><span class="p">,</span> <span class="s1">'group'</span><span class="p">,</span> <span class="s1">'limit'</span><span class="p">,</span> <span class="s1">'to'</span><span class="p">,</span> <span class="s1">'pragma'</span><span class="p">,</span> <span class="s1">'unique'</span><span class="p">,</span> <span class="s1">'raise'</span><span class="p">,</span> <span class="s1">'initially'</span><span class="p">,</span> <span class="s1">'distinct'</span><span class="p">,</span> <span class="s1">'column'</span><span class="p">,</span> <span class="s1">'asc'</span><span class="p">,</span> <span class="s1">'notnull'</span><span class="p">,</span> <span class="s1">'null'</span><span class="p">,</span> <span class="s1">'between'</span><span class="p">,</span> <span class="s1">'rollback'</span><span class="p">,</span> <span class="s1">'end'</span><span class="p">,</span> <span class="s1">'when'</span><span class="p">,</span> <span class="s1">'deferrable'</span><span class="p">,</span> <span class="s1">'detach'</span><span class="p">,</span> <span class="s1">'match'</span><span class="p">,</span> <span class="s1">'all'</span><span class="p">,</span> <span class="s1">'temp'</span><span class="p">,</span> <span class="s1">'isnull'</span><span class="p">,</span> <span class="s1">'join'</span><span class="p">,</span> <span class="s1">'trigger'</span><span class="p">,</span> <span class="s1">'query'</span><span class="p">,</span> <span class="s1">'from'</span><span class="p">,</span> <span class="s1">'autoincrement'</span><span class="p">,</span> <span class="s1">'ignore'</span><span class="p">,</span> <span class="s1">'after'</span><span class="p">,</span> <span class="s1">'table'</span><span class="p">,</span> <span class="s1">'order'</span><span class="p">,</span> <span class="s1">'alter'</span><span class="p">,</span> <span class="s1">'reindex'</span><span class="p">,</span> <span class="s1">'is'</span><span class="p">,</span> <span class="s1">'intersect'</span><span class="p">,</span> <span class="s1">'primary'</span><span class="p">,</span> <span class="s1">'then'</span><span class="p">,</span> <span class="s1">'and'</span><span class="p">,</span> <span class="s1">'set'</span><span class="p">,</span> <span class="s1">'like'</span><span class="p">,</span> <span class="s1">'index'</span><span class="p">,</span> <span class="s1">'by'</span><span class="p">,</span> <span class="s1">'default'</span><span class="p">,</span> <span class="s1">'else'</span><span class="p">,</span> <span class="s1">'rename'</span><span class="p">,</span> <span class="s1">'plan'</span><span class="p">,</span> <span class="s1">'except'</span><span class="p">,</span> <span class="s1">'row'</span><span class="p">,</span> <span class="s1">'instead'</span><span class="p">,</span> <span class="s1">'natural'</span><span class="p">,</span> <span class="s1">'analyze'</span><span class="p">,</span> <span class="s1">'foreign'</span><span class="p">,</span> <span class="s1">'database'</span><span class="p">,</span> <span class="s1">'if'</span><span class="p">,</span> <span class="s1">'current_time'</span><span class="p">,</span> <span class="s1">'glob'</span><span class="p">,</span> <span class="s1">'current_date'</span><span class="p">,</span> <span class="s1">'cross'</span><span class="p">,</span> <span class="s1">'key'</span><span class="p">,</span> <span class="s1">'values'</span><span class="p">,</span> <span class="s1">'into'</span><span class="p">,</span> <span class="s1">'constraint'</span><span class="p">,</span> <span class="s1">'exists'</span><span class="p">,</span> <span class="s1">'left'</span><span class="p">,</span> <span class="s1">'delete'</span><span class="p">,</span> <span class="s1">'each'</span><span class="p">,</span> <span class="s1">'or'</span><span class="p">,</span> <span class="s1">'false'</span><span class="p">,</span> <span class="s1">'commit'</span><span class="p">,</span> <span class="s1">'exclusive'</span><span class="p">,</span> <span class="s1">'immediate'</span><span class="p">,</span> <span class="s1">'restrict'</span><span class="p">,</span> <span class="s1">'not'</span><span class="p">,</span> <span class="s1">'create'</span><span class="p">,</span> <span class="s1">'desc'</span><span class="p">,</span> <span class="s1">'true'</span><span class="p">,</span> <span class="s1">'using'</span><span class="p">,</span> <span class="s1">'replace'</span><span class="p">,</span> <span class="s1">'collate'</span><span class="p">}</span>
</code></pre></div>
<p><strong>Previewing Raw Data</strong></p>
<p>I want to peek at the data in my SQLite database. The <a href="https://sqlitebrowser.org/dl/">DB Browser for SQLite app</a> on Mac is a good option for this. Better yet, once installed, I can use my terminal to pass the app the file location of my sqlite DB!</p>
<div class="highlight"><pre><span></span><code><span class="nv">open</span> <span class="o">-</span><span class="nv">a</span> <span class="s2">"</span><span class="s">DB Browser for SQLite</span><span class="s2">"</span> <span class="o">/</span><span class="nv">Users</span><span class="o">/</span><span class="nv">srinik</span><span class="o">/</span>.<span class="nv">superset</span><span class="o">/</span><span class="nv">superset</span>.<span class="nv">db</span>
</code></pre></div>
<p>And voila!</p>
<p><img alt="DB Browser" src="/images/db_browser.png"></p>
<p>Let's preview the <code>dbs</code> table (which corresponds to the Database model).</p>
<p><img alt="DB Browser DBs" src="/images/db_browser_dbs.png"></p>
<p>It's nice to see all of the columns reflected here from the Database model.</p>
<h3>World Health Dashboard: Examples Database</h3>
<p>Let's revisit the <code>load_world_bank_health_n_pop()</code> function that loads the World Health Dashboard.</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">load_world_bank_health_n_pop</span><span class="p">(</span> <span class="c1"># pylint: disable=too-many-locals, too-many-statements</span>
<span class="n">only_metadata</span><span class="p">:</span> <span class="nb nb-Type">bool</span> <span class="o">=</span> <span class="n">False</span><span class="p">,</span> <span class="n">force</span><span class="p">:</span> <span class="nb nb-Type">bool</span> <span class="o">=</span> <span class="n">False</span><span class="p">,</span> <span class="n">sample</span><span class="p">:</span> <span class="nb nb-Type">bool</span> <span class="o">=</span> <span class="n">False</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">None</span><span class="p">:</span>
<span class="sd">"""Loads the world bank health dataset, slices and a dashboard"""</span>
<span class="n">tbl_name</span> <span class="o">=</span> <span class="s2">"wb_health_population"</span>
<span class="n">database</span> <span class="o">=</span> <span class="n">utils</span><span class="o">.</span><span class="n">get_example_database</span><span class="p">()</span>
<span class="n">engine</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="n">get_sqla_engine</span><span class="p">()</span>
<span class="n">schema</span> <span class="o">=</span> <span class="n">inspect</span><span class="p">(</span><span class="n">engine</span><span class="p">)</span><span class="o">.</span><span class="n">default_schema_name</span>
<span class="n">table_exists</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="n">has_table_by_name</span><span class="p">(</span><span class="n">tbl_name</span><span class="p">)</span>
</code></pre></div>
<p>This code does the following:</p>
<ul>
<li>Sets the table name to <code>wb_health_population</code></li>
<li>Uses utility functions to fetch the Database object corresponding to the <code>examples</code> database (or creates it if it isn't there)</li>
<li>Retrieves the SQLAlchemy engine for this specific database flavor (from the <code>db_engine_specs</code> folder) so queries can be made to the database.</li>
<li>Retrieves the default schema name if it exists.</li>
<li>Confirms if the <code>wb_health_population</code> table exists or not.</li>
</ul>
<h3>World Health Dashboard: Pandas Transformation</h3>
<p>As someone who's spent years writing pandas code, the next part of of the <code>load_world_bank_health_n_pop()</code> function looks very familiar:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="k">not</span> <span class="n">only_metadata</span> <span class="k">and</span> <span class="p">(</span><span class="k">not</span> <span class="n">table_exists</span> <span class="k">or</span> <span class="k">force</span><span class="p">)</span><span class="o">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">get_example_data</span><span class="p">(</span><span class="s">"countries.json.gz"</span><span class="p">)</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_json</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="n">pdf</span><span class="p">.</span><span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="n">col</span><span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"."</span><span class="p">,</span> <span class="s">"_"</span><span class="p">)</span> <span class="k">for</span> <span class="n">col</span> <span class="n">in</span> <span class="n">pdf</span><span class="p">.</span><span class="n">columns</span><span class="p">]</span>
<span class="k">if</span> <span class="n">database</span><span class="p">.</span><span class="n">backend</span> <span class="o">==</span> <span class="s">"presto"</span><span class="o">:</span>
<span class="n">pdf</span><span class="p">.</span><span class="n">year</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">to_datetime</span><span class="p">(</span><span class="n">pdf</span><span class="p">.</span><span class="n">year</span><span class="p">)</span>
<span class="n">pdf</span><span class="p">.</span><span class="n">year</span> <span class="o">=</span> <span class="n">pdf</span><span class="p">.</span><span class="n">year</span><span class="p">.</span><span class="n">dt</span><span class="p">.</span><span class="n">strftime</span><span class="p">(</span><span class="s">"%Y-%m-%d %H:%M%:%S"</span><span class="p">)</span>
<span class="k">else</span><span class="o">:</span>
<span class="n">pdf</span><span class="p">.</span><span class="n">year</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">to_datetime</span><span class="p">(</span><span class="n">pdf</span><span class="p">.</span><span class="n">year</span><span class="p">)</span>
<span class="n">pdf</span> <span class="o">=</span> <span class="n">pdf</span><span class="p">.</span><span class="n">head</span><span class="p">(</span><span class="mh">100</span><span class="p">)</span> <span class="k">if</span> <span class="n">sample</span> <span class="k">else</span> <span class="n">pdf</span>
<span class="n">pdf</span><span class="p">.</span><span class="n">to_sql</span><span class="p">(</span>
<span class="n">tbl_name</span><span class="p">,</span>
<span class="n">engine</span><span class="p">,</span>
<span class="n">schema</span><span class="o">=</span><span class="n">schema</span><span class="p">,</span>
<span class="n">if_exists</span><span class="o">=</span><span class="s">"replace"</span><span class="p">,</span>
<span class="n">chunksize</span><span class="o">=</span><span class="mh">50</span><span class="p">,</span>
<span class="n">dtype</span><span class="o">=</span><span class="p">{</span>
<span class="p">#</span> <span class="n">TODO</span><span class="p">(</span><span class="n">bkyryliuk</span><span class="p">)</span><span class="o">:</span> <span class="n">use</span> <span class="n">TIMESTAMP</span> <span class="k">type</span> <span class="k">for</span> <span class="n">presto</span>
<span class="s">"year"</span><span class="o">:</span> <span class="n">DateTime</span> <span class="k">if</span> <span class="n">database</span><span class="p">.</span><span class="n">backend</span> <span class="o">!=</span> <span class="s">"presto"</span> <span class="k">else</span> <span class="n">String</span><span class="p">(</span><span class="mh">255</span><span class="p">),</span>
<span class="s">"country_code"</span><span class="o">:</span> <span class="n">String</span><span class="p">(</span><span class="mh">3</span><span class="p">),</span>
<span class="s">"country_name"</span><span class="o">:</span> <span class="n">String</span><span class="p">(</span><span class="mh">255</span><span class="p">),</span>
<span class="s">"region"</span><span class="o">:</span> <span class="n">String</span><span class="p">(</span><span class="mh">255</span><span class="p">),</span>
<span class="p">},</span>
<span class="n">method</span><span class="o">=</span><span class="s">"multi"</span><span class="p">,</span>
<span class="n">index</span><span class="o">=</span><span class="n">False</span><span class="p">,</span>
<span class="p">)</span>
</code></pre></div>
<p>Here's my breakdown of the code:</p>
<ul>
<li><code>data = get_example_data("countries.json.gz")</code>:<ul>
<li><code>get_example_data()</code> is a helper function that fetches the gzipped JSON dataset for this example from <code>https://github.com/apache-superset/examples-data/blob/master/countries.json.gz</code></li>
</ul>
</li>
<li><code>pdf = pd.read_json(data)</code>: <ul>
<li>read in JSON as a pandas dataframe</li>
</ul>
</li>
<li><code>pdf.columns = [col.replace(".", "_") for col in pdf.columns]</code>: <ul>
<li>replace any periods with <code>_</code>, so the database is happy</li>
</ul>
</li>
<li><code>if database.backend == "presto":</code> if the examples Database object points to a Presto database, do some specific datetime conversion for Presto.</li>
<li><code>pdf.to_sql()</code>: use the <code>pandas.DataFrame.to_sql()</code> method to generate a SQLAlchemy 'query' to insert data into the database.</li>
</ul>
<p>Phew! That's it for today. Tomorrow, I want to finish understanding how the Superset-specific metadata is loaded.</p>Apache Superset from Scratch: Day 5 (More Flask App)2021-12-28T10:20:00-05:002021-12-28T10:20:00-05:00Srini Kadamatitag:None,2021-12-28:/apache-superset-from-scratch-day-5-more-flask-app.html<p>I ended Day 4 trying to understand how the World Health dashboard was imported. I walked away with a lot of open questions around how the flask app factory pattern worked. After sleeping on it and approaching with fresh eyes, I'm excited to hopefully make more progress today.</p>
<h3>Flask App …</h3><p>I ended Day 4 trying to understand how the World Health dashboard was imported. I walked away with a lot of open questions around how the flask app factory pattern worked. After sleeping on it and approaching with fresh eyes, I'm excited to hopefully make more progress today.</p>
<h3>Flask App Context</h3>
<p>I spent the morning reading the following articles from the excellent Flask documentation:</p>
<ul>
<li><a href="https://flask.palletsprojects.com/en/2.0.x/appcontext/">the Application Context</a></li>
<li><a href="https://flask.palletsprojects.com/en/2.0.x/api/#flask.Config">class flask.Config</a></li>
<li><a href="https://flask.palletsprojects.com/en/2.0.x/patterns/appfactories/">Application Factories</a></li>
<li><a href="https://flask.palletsprojects.com/en/2.0.x/config/#configuring-from-python-files">Configuring from Python Files</a></li>
</ul>
<p>After going deep into these, I'll attempt to walkthrough everything I learned.</p>
<p>As I mentioned in the last post, the crucial entry-point into the Flask application is the <code>create_app()</code> function from <code>superset/superset/app.py</code>. Here's the entire function definition:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">create_app</span><span class="p">()</span> <span class="o">-></span> <span class="n">Flask</span><span class="o">:</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">SupersetApp</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">try</span><span class="o">:</span>
<span class="err">#</span> <span class="n">Allow</span> <span class="n">user</span> <span class="n">to</span> <span class="n">override</span> <span class="n">our</span> <span class="n">config</span> <span class="n">completely</span>
<span class="n">config_module</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"SUPERSET_CONFIG"</span><span class="p">,</span> <span class="s">"superset.config"</span><span class="p">)</span>
<span class="n">app</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="n">from_object</span><span class="p">(</span><span class="n">config_module</span><span class="p">)</span>
<span class="n">app_initializer</span> <span class="o">=</span> <span class="n">app</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"APP_INITIALIZER"</span><span class="p">,</span> <span class="n">SupersetAppInitializer</span><span class="p">)(</span><span class="n">app</span><span class="p">)</span>
<span class="n">app_initializer</span><span class="p">.</span><span class="n">init_app</span><span class="p">()</span>
<span class="kr">return</span> <span class="n">app</span>
<span class="err">#</span> <span class="n">Make</span> <span class="n">sure</span> <span class="n">that</span> <span class="n">bootstrap</span> <span class="n">errors</span> <span class="n">ALWAYS</span> <span class="n">get</span> <span class="n">logged</span>
<span class="kr">except</span> <span class="n">Exception</span> <span class="kr">as</span> <span class="n">ex</span><span class="o">:</span>
<span class="n">logger</span><span class="p">.</span><span class="n">exception</span><span class="p">(</span><span class="s">"Failed to create app"</span><span class="p">)</span>
<span class="n">raise</span> <span class="n">ex</span>
</code></pre></div>
<p>Within <code>create_app()</code>, the following line of code defines what <code>current_app</code> refers to: </p>
<div class="highlight"><pre><span></span><code>app = SupersetApp(__name__)
</code></pre></div>
<p>The <code>current_app</code> variable acts as a global variable for different parts of your application to reference & use. The following line of code retrieves information from the <code>SUPERSET_CONFIG</code> environment variable (using <code>os.environ.get()</code>) and defaults to <code>superset.config</code> if not found:</p>
<div class="highlight"><pre><span></span><code>config_module = os.environ.get("SUPERSET_CONFIG", "superset.config")
</code></pre></div>
<p>Then, the configuration information is loaded and attached to the <code>app</code> object (elsewhere in the application it would be referenced as <code>current_app</code>).</p>
<div class="highlight"><pre><span></span><code>app.config.from_object(config_module)
</code></pre></div>
<p>All of the information so far suggests that the <code>SQLALCHEMY_EXAMPLES_URI</code> value is meant to be configured, which makes sense! </p>
<ul>
<li>By default in a native Superset installation, the SQLite database in my home directory is used. </li>
<li>But within the Docker Compose image for Superset, the included Postgres database is used instead.</li>
</ul>
<p>There's still SO much I don't understand about Flask, but I need to do a separate, multi-day deep dive into that web framework. I want to balance breadth with depth here and it may be time to move on with the cursory understanding I have.</p>
<blockquote>
<p>Note to self: Go through <a href="https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xv-a-better-application-structure">Flask mega-tutorial</a>, which seems to be consistently recommended by people online!</p>
</blockquote>
<h3>Examples Database</h3>
<p>I want to come back for air, and circle back to how the World Health dashboard is loaded into the Superset metadata database. I want to understand this function better, which is called from the <code>load_world_bank_health_n_pop()</code> function in <code>world_bank.py</code>:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">get_example_database</span><span class="p">()</span> <span class="o">-></span> <span class="s">"Database"</span><span class="o">:</span>
<span class="n">db_uri</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">current_app</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"SQLALCHEMY_EXAMPLES_URI"</span><span class="p">)</span>
<span class="kr">or</span> <span class="n">current_app</span><span class="p">.</span><span class="n">config</span><span class="p">[</span><span class="s">"SQLALCHEMY_DATABASE_URI"</span><span class="p">]</span>
<span class="p">)</span>
<span class="kr">return</span> <span class="n">get_or_create_db</span><span class="p">(</span><span class="s">"examples"</span><span class="p">,</span> <span class="n">db_uri</span><span class="p">)</span>
</code></pre></div>
<p>The first clause looks interesting:</p>
<div class="highlight"><pre><span></span><code>db_uri = (
current_app.config.get("SQLALCHEMY_EXAMPLES_URI")
or current_app.config["SQLALCHEMY_DATABASE_URI"]
)
</code></pre></div>
<p>This code is attempting to look up the database URI based on the app's configuration settings. We know that <code>current_app.config.get()</code> looks up values from <code>superset/superset/config.py</code>. At 1337 lines of code, the <code>config.py</code> file is massive. It contains code mostly assigning values to all-upper-case variable names. Here's an example:</p>
<div class="highlight"><pre><span></span><code>SQLALCHEMY_EXAMPLES_URI = None
</code></pre></div>
<p>Here's a walkthrough of how <code>db_uri</code> is calculated:</p>
<ul>
<li>The first clause is attempting to find a truthy value, between <code>SQLALCHEMY_EXAMPLES_URI</code> and <code>SQLALCHEMY_DATABASE_URI</code>. </li>
<li>Because by default <code>SQLALCHEMY_EXAMPLES_URI</code> is set to <code>None</code>, the value for <code>SQLALCHEMY_DATABASE_URI</code> is then looked up.</li>
<li>By default, <code>SQLALCHEMY_DATABASE_URI</code> is assigned to evaluate: <code>"sqlite:///" + os.path.join(DATA_DIR, "superset.db")</code></li>
</ul>
<p>Now we're getting somewhere! The <code>sqlite:///</code> and <code>"superset.db"</code> parts <em>smells</em> a lot like the location of the sqlite metadata database that lives in my home directory that I dug up in <a href="/apache-superset-from-scratch-day-2-metadata-database.html">my Day 2 post</a>:</p>
<div class="highlight"><pre><span></span><code>cat ~/.superset/superset.db
</code></pre></div>
<p>But what's this <code>DATA_DIR</code> value and how is it computed? I did a quick search within <code>superset/superset/config.py</code> and the first instance of <code>DATA_DIR</code> is referenced here:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="s2">"</span><span class="s">SUPERSET_HOME</span><span class="s2">"</span> <span class="nv">in</span> <span class="nv">os</span>.<span class="nv">environ</span>:
<span class="nv">DATA_DIR</span> <span class="o">=</span> <span class="nv">os</span>.<span class="nv">environ</span>[<span class="s2">"</span><span class="s">SUPERSET_HOME</span><span class="s2">"</span>]
<span class="k">else</span>:
<span class="nv">DATA_DIR</span> <span class="o">=</span> <span class="nv">os</span>.<span class="nv">path</span>.<span class="nv">join</span><span class="ss">(</span><span class="nv">os</span>.<span class="nv">path</span>.<span class="nv">expanduser</span><span class="ss">(</span><span class="s2">"</span><span class="s">~</span><span class="s2">"</span><span class="ss">)</span>, <span class="s2">"</span><span class="s">.superset</span><span class="s2">"</span><span class="ss">)</span>
</code></pre></div>
<p>Because I didn't specifically set <code>SUPERSET_HOME</code> in my environment variables, then the second code path is being evaluated instead:</p>
<div class="highlight"><pre><span></span><code>DATA_DIR = os.path.join(os.path.expanduser("~"), ".superset")
</code></pre></div>
<p>I quickly ran this in a new Python shell and the result mapped exactly to the <code>.superset/</code> folder within my home directory:</p>
<p><img alt="Data Dir" src="/images/data_dir.png"></p>
<p>This means that <code>SQLALCHEMY_DATABASE_URI</code> points to my metadata database, as expected. Progress!</p>
<p>Finally, this means that the <code>get_example_database()</code> function will return the location to my sqlite database or it will create it if it doesn't exist (as the name <code>get_or_create_db()</code> suggests):</p>
<div class="highlight"><pre><span></span><code> <span class="k">return</span> <span class="nv">get_or_create_db</span><span class="ss">(</span><span class="s2">"</span><span class="s">examples</span><span class="s2">"</span>, <span class="nv">db_uri</span><span class="ss">)</span>
</code></pre></div>
<p>The return value of <code>utils.get_example_database()</code> is assigned to the <code>database</code> variable.</p>
<h3>Superset Shell</h3>
<p>While reading function definitions is great, the only way to learn technical concepts is getting your hands dirty and actually running code yourself. </p>
<p>What's the best way to actually accomplish this though, while having the application lifecycle state loaded for me to interact with?</p>
<p>Some searching online led me to this <a href="https://flask.palletsprojects.com/en/2.0.x/cli/#open-a-shell">page in the Flask docs</a>, which mentions the following:</p>
<blockquote>
<p>To explore the data in your application, you can start an interactive Python shell with the shell command. An application context will be active, and the app instance will be imported.</p>
</blockquote>
<p>I also know that Superset extends many of the underlying Flask metaphors and I remember seeing <code>superset shell</code> listed when running the Superset CLI:</p>
<div class="highlight"><pre><span></span><code>...
run Run a development server.
set-database-uri Updates a database connection URI
shell Run a shell in the app context.
sync-tags Rebuilds special tags (owner, type, favorited
...
</code></pre></div>
<p>I'm going to try this out:</p>
<div class="highlight"><pre><span></span><code>superset shell
</code></pre></div>
<p>Excellent! I now have a shell environment with the Superset App context loaded in:</p>
<p><img alt="Superset Shell" src="/images/superset_shell.png"></p>
<h3>Next Steps</h3>
<p>I've run out of time for the day and will end here. Next, I want to step through all of the function calls in the World Health dashboard example using the Superset shell.</p>Apache Superset from Scratch: Day 4 (Superset & Flask Entrypoint)2021-12-26T10:20:00-05:002021-12-26T10:20:00-05:00Srini Kadamatitag:None,2021-12-26:/apache-superset-from-scratch-day-4-superset-flask-entrypoint.html<p>I ended Day 3 by setting up the backend and frontend servers, but running into the following error.</p>
<p><img alt="Superset UI" src="/images/superset_ui.png"></p>
<p>For good measure, I'm going to shut down and restart the backend server. Success!</p>
<p><img alt="Superset UI Fixed" src="/images/superset_ui_fixed.png"></p>
<p>No thumbnails though ☹️. Well I do know from previous experience that thumbnails require setting up a separate celery …</p><p>I ended Day 3 by setting up the backend and frontend servers, but running into the following error.</p>
<p><img alt="Superset UI" src="/images/superset_ui.png"></p>
<p>For good measure, I'm going to shut down and restart the backend server. Success!</p>
<p><img alt="Superset UI Fixed" src="/images/superset_ui_fixed.png"></p>
<p>No thumbnails though ☹️. Well I do know from previous experience that thumbnails require setting up a separate celery server. This will need separate investigation.</p>
<p>Now that I have Superset up and running, what should I look into next? I really want to dive deeper into how the example datasets, charts, and dashboards are loaded. This will force me to better understand the internal Superset data model.</p>
<h3>How the Superset Examples Work</h3>
<p>The world health dashboard looks interesting and like one of the more complex ones. I'll start by poking deeper into this one:</p>
<p><img alt="World Health Dashboards" src="/images/world_health_dashboard.png"></p>
<p>Here's the relevant function call from <code>superset/cli.py</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nb">print</span><span class="p">(</span><span class="s2">"Loading [World Bank's Health Nutrition and Population Stats]"</span><span class="p">)</span>
<span class="n">examples</span><span class="o">.</span><span class="n">load_world_bank_health_n_pop</span><span class="p">(</span><span class="n">only_metadata</span><span class="p">,</span> <span class="n">force</span><span class="p">)</span>
</code></pre></div>
<p>Sooo, let's get into it. Where does the second line of code actually point to? As I mentioned in <a href="/apache-superset-from-scratch-day-2-metadata-database.html">Day 2</a>, the <code>superset/superset/examples/__init__.py</code> file contains mappings like this one:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">.world_bank</span> <span class="kn">import</span> <span class="n">load_world_bank_health_n_pop</span>
</code></pre></div>
<p>This means that the <code>load_world_bank_health_n_pop()</code> function lives in <code>examples/world_bank.py</code>! Here's a preview of the first 8 lines:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">load_world_bank_health_n_pop</span><span class="p">(</span> <span class="c1"># pylint: disable=too-many-locals, too-many-statements</span>
<span class="n">only_metadata</span><span class="p">:</span> <span class="nb nb-Type">bool</span> <span class="o">=</span> <span class="n">False</span><span class="p">,</span> <span class="n">force</span><span class="p">:</span> <span class="nb nb-Type">bool</span> <span class="o">=</span> <span class="n">False</span><span class="p">,</span> <span class="n">sample</span><span class="p">:</span> <span class="nb nb-Type">bool</span> <span class="o">=</span> <span class="n">False</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="n">None</span><span class="p">:</span>
<span class="sd">"""Loads the world bank health dataset, slices and a dashboard"""</span>
<span class="n">tbl_name</span> <span class="o">=</span> <span class="s2">"wb_health_population"</span>
<span class="n">database</span> <span class="o">=</span> <span class="n">utils</span><span class="o">.</span><span class="n">get_example_database</span><span class="p">()</span>
<span class="n">engine</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="n">get_sqla_engine</span><span class="p">()</span>
<span class="n">schema</span> <span class="o">=</span> <span class="n">inspect</span><span class="p">(</span><span class="n">engine</span><span class="p">)</span><span class="o">.</span><span class="n">default_schema_name</span>
<span class="n">table_exists</span> <span class="o">=</span> <span class="n">database</span><span class="o">.</span><span class="n">has_table_by_name</span><span class="p">(</span><span class="n">tbl_name</span><span class="p">)</span>
</code></pre></div>
<p>The first line almost surely refers to the table name in the examples database.</p>
<div class="highlight"><pre><span></span><code>tbl_name = "wb_health_population"
</code></pre></div>
<p>How can I confirm this? The fastest way is to probably crack open SQL Lab and inspect the table name for the examples database.</p>
<p><img alt="World Health Table" src="/images/wb_sqllab.png"></p>
<p>Confirmed. Let's check out the next line:</p>
<div class="highlight"><pre><span></span><code>database = utils.get_example_database()
</code></pre></div>
<p>Ah yes, the art of the <code>utils</code>! The perfect hiding place for some arbitrary functions. Where does this point to?</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">superset.utils</span> <span class="kn">import</span> <span class="n">core</span> <span class="k">as</span> <span class="n">utils</span>
</code></pre></div>
<p>So there should be a <code>utils/core.py</code> file. Oh boy, this file is 1835 lines of code. But it does have the <code>get_example_database()</code> function that's called. The function definition is pretty short so I'm including it here in it's entireity:</p>
<div class="highlight"><pre><span></span><code><span class="n">def</span> <span class="n">get_example_database</span><span class="p">()</span> <span class="o">-></span> <span class="s">"Database"</span><span class="o">:</span>
<span class="n">db_uri</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">current_app</span><span class="p">.</span><span class="n">config</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="s">"SQLALCHEMY_EXAMPLES_URI"</span><span class="p">)</span>
<span class="kr">or</span> <span class="n">current_app</span><span class="p">.</span><span class="n">config</span><span class="p">[</span><span class="s">"SQLALCHEMY_DATABASE_URI"</span><span class="p">]</span>
<span class="p">)</span>
<span class="kr">return</span> <span class="n">get_or_create_db</span><span class="p">(</span><span class="s">"examples"</span><span class="p">,</span> <span class="n">db_uri</span><span class="p">)</span>
</code></pre></div>
<p>Now we're getting somewhere! This function tells the app which database is designated as the <strong>examples</strong> one. First things first, where is <code>current_app</code> defined?</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">current_app</span><span class="p">,</span> <span class="n">flash</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">Markup</span><span class="p">,</span> <span class="n">render_template</span><span class="p">,</span> <span class="n">request</span>
</code></pre></div>
<p>The <code>current_app</code> object is imported from <a href="https://flask.palletsprojects.com/en/2.0.x/">flask</a>, which is a Python microframework for creating web applications. </p>
<blockquote>
<p>Note to self: I need to dig into the relationship between Flask and Flask App-builder later on.</p>
</blockquote>
<p>After a quick Google search, I found the page in the flask documentation on <a href="https://flask.palletsprojects.com/en/2.0.x/appcontext/">the current application context</a>. Perusing this page, this reminds me a lot of of the <code>request</code> object from my Ruby on Rails days. This was a form of passing application state around.</p>
<p>This is essentially a way to reference values specific to the current instance running of the running Superset application.</p>
<blockquote>
<p>The application context keeps track of the application-level data during a request, CLI command, or other activity. Rather than passing the application around to each function, the current_app and g proxies are accessed instead.</p>
<p>This is similar to The Request Context, which keeps track of request-level data during a request. A corresponding application context is pushed when a request context is pushed.</p>
</blockquote>
<p>Well what do you know! There's a call out to the <a href="https://flask.palletsprojects.com/en/2.0.x/reqcontext/">flask version of the request object</a> that's similar to the one from Rails! My hunch was spot on.</p>
<h3>App Factory Pattern</h3>
<p>To take a step back, what actually happens when I run <code>superset run</code> from the CLI?</p>
<p>The <a href="https://flask.palletsprojects.com/en/2.0.x/tutorial/factory/">flask documentation</a> suggests that the <code>def create_app()</code> function definition is likely to be the <strong>entrypoint</strong> to the flask application at the core of Superset that is called when <code>flask run</code> or an equivalent is run.</p>
<p>A quick search in my text editor found only one non-test-related definition for <code>create_app()</code>, found in <code>superset/superset/app.py</code>:</p>
<p><img alt="App Py" src="/images/app_py.png"></p>
<p>These lines especially look relevant to our investigation of how:</p>
<ul>
<li>the World Health dashboard is loaded</li>
<li>how <code>current_app.config.get("SQLALCHEMY_EXAMPLES_URI")</code> resolves</li>
<li>getting closer to <code>current_app</code></li>
</ul>
<div class="highlight"><pre><span></span><code>try:
# Allow user to override our config completely
config_module = os.environ.get("SUPERSET_CONFIG", "superset.config")
app.config.from_object(config_module)
</code></pre></div>
<h3>Next Steps</h3>
<p>I feel closer to understanding all of the links here, but sadly I'm out of time today. Here are the open questions I still have:</p>
<ul>
<li>What's the link between <code>create_app()</code> and <code>current_app</code>?</li>
<li>Where is the value for <code>SQLALCHEMY_EXAMPLES_URI</code> actually set?</li>
<li>How does this line of code actually work: <code>os.environ.get("SUPERSET_CONFIG", "superset.config")</code>?</li>
</ul>Apache Superset from Scratch: Day 3 (Frontend Setup)2021-12-25T10:20:00-05:002021-12-25T10:20:00-05:00Srini Kadamatitag:None,2021-12-25:/apache-superset-from-scratch-day-3-frontend-setup.html<p>In Day 3, I'm going to dive into setting up the frontend. In general, I'm quite new to the frontend ecosystem, so expect lots of tangents to fill in knowledge gaps along the way!</p>
<p>We'll start with the <a href="https://github.com/apache/superset/blob/master/CONTRIBUTING.md#frontend">Frontend section from CONTRIBUTING.MD</a>.</p>
<p>The first paragraph has some helpful historical …</p><p>In Day 3, I'm going to dive into setting up the frontend. In general, I'm quite new to the frontend ecosystem, so expect lots of tangents to fill in knowledge gaps along the way!</p>
<p>We'll start with the <a href="https://github.com/apache/superset/blob/master/CONTRIBUTING.md#frontend">Frontend section from CONTRIBUTING.MD</a>.</p>
<p>The first paragraph has some helpful historical context:</p>
<blockquote>
<p>Frontend assets (TypeScript, JavaScript, CSS, and images) must be compiled in order to properly display the web UI. The superset-frontend directory contains all NPM-managed frontend assets. Note that for some legacy pages there are additional frontend assets bundled with Flask-Appbuilder (e.g. jQuery and bootstrap). These are not managed by NPM and may be phased out in the future.</p>
</blockquote>
<h3>Node</h3>
<p>Thankfully, I've used Node a little bit before. Let me check what version I have installed on this computer. Usually the <code>--version</code> flag will do the trick!</p>
<div class="highlight"><pre><span></span><code>node --version
> v17.3.0
</code></pre></div>
<p>The guide recommends Node 16, but Node 17.x should be fine. Let's check the <code>npm</code> version next. Npm is the Node package manager:</p>
<div class="highlight"><pre><span></span><code>npm --version
> 8.3.0
</code></pre></div>
<p>The guide recommends using <code>nvm</code> to manage different Node versions. This is helpful advice, but I don't want to prematurely optimize and add more abstraction / complexity than needed. So let's soldier on for now.</p>
<h3>Package.json</h3>
<p>The <code>package.json</code> is the Node equivalent to Python's <code>requirements.txt</code> file. For Superset, the <code>package.json</code> file lives within the <code>superset/superset-frontend/</code> folder. Let's switch into that folder.</p>
<p>What's actually in this file? A. LOT. Let's break some of this down.</p>
<div class="highlight"><pre><span></span><code>{
"name": "superset",
"version": "0.0.0dev",
"description": "Superset is a data exploration platform designed to be visual, intuitive, and interactive.",
"keywords": [
"big",
"data",
"exploratory",
"analysis",
"react",
"d3",
"airbnb",
"nerds",
"database",
"flask"
],
</code></pre></div>
<p>This line is interesting: <code>"version": "0.0.0dev".</code> I wonder if this is where the Superset version value that's shown in the Superset UI lives? </p>
<p><em>As a quick detour, I wonder what this value in the <code>package.json</code> file <a href="https://github.com/apache/superset/blob/1.4/superset-frontend/package.json">in the Superset v1.4 release</a> looks like?</em></p>
<p><img alt="Superset CLI" src="/images/superset_14_package.png"></p>
<p>My hunch was right! 1.4 is harcoded as a string in the <code>package.json</code> file. Cool!</p>
<p>Then we can run <code>npm install</code>, which should use the <code>superset-frontend/package.json</code> file. But the documentation suggests <code>npm ci</code>. <a href="https://stackoverflow.com/a/53325242">Searching online suggests</a> using <code>npm ci</code> if there's an existing <code>package-lock.json</code> file.</p>
<p>Because the project has an existing <code>package-lock.json</code> file, let's use <code>npm ci</code>!</p>
<p><img alt="npm ci" src="/images/npm_ci_1.png"></p>
<p>In the first half of the CLI output, I see that npm installed 5009 packages and displayed a bunch of deprecation warnings.</p>
<p><img alt="npm ci" src="/images/npm_ci_2.png"></p>
<p>In the second half of the CLI output, I see that there are 111 vulnerabilities. I'm noting both of these down (through this post!) to investigate later.</p>
<h3>Build Frontend Assets</h3>
<p>Next, as the guide suggests, I will run <code>npm run build</code>. After a few minutes, I was presented with many warnings but some indication that the build succeeded?</p>
<p><img alt="npm run build" src="/images/npm_run_build.png"></p>
<p>Next, we can start the dev server at port <code>9000</code> by running:</p>
<div class="highlight"><pre><span></span><code>npm run dev-server
</code></pre></div>
<p>Here's what the CLI output looks like with both the frontend and backend runinng simultaneously:</p>
<p><img alt="backend and frontend" src="/images/backend_frontend.png"></p>
<p>Exciting! Now if I head to <code>localhost:8088</code>, I should see Superset:</p>
<p><img alt="Superset UI" src="/images/superset_ui.png"></p>
<p>Hmm, that's curious. I'm logged in as the admin and I'm still seeing issues.</p>
<p>Unfortunately I'm out of time for today, so I'll have to debug this on Day 4!</p>Apache Superset from Scratch: Day 2 (Metadata Database)2021-12-24T10:20:00-05:002021-12-24T10:20:00-05:00Srini Kadamatitag:None,2021-12-24:/apache-superset-from-scratch-day-2-metadata-database.html<p>In Day 1, I setup the backend Python depedencies. Now, I'm going to start the metadata database. The next step, as laid out in <a href="https://github.com/apache/superset/blob/master/CONTRIBUTING.md#setup-local-environment-for-development">CONTRIBUTING.MD</a>, is to run:</p>
<div class="highlight"><pre><span></span><code>superset db upgrade
</code></pre></div>
<h3>Superset CLI</h3>
<p>Before we do that, I want to get more familiar with the Superset CLI. If you …</p><p>In Day 1, I setup the backend Python depedencies. Now, I'm going to start the metadata database. The next step, as laid out in <a href="https://github.com/apache/superset/blob/master/CONTRIBUTING.md#setup-local-environment-for-development">CONTRIBUTING.MD</a>, is to run:</p>
<div class="highlight"><pre><span></span><code>superset db upgrade
</code></pre></div>
<h3>Superset CLI</h3>
<p>Before we do that, I want to get more familiar with the Superset CLI. If you recall from the last post, running <code>superset</code> in the command line exposes a number of interesting commands we could run:</p>
<p><img alt="Superset CLI" src="/images/superset_cli2.png"></p>
<p>Some interesting commands that stick out:</p>
<ul>
<li>db: Perform database migrations.</li>
<li>export-dashboards: Export dashboards to JSON</li>
<li>fab: FAB flask group commands</li>
<li>init: Inits the Superset application</li>
</ul>
<p>Where does the code for these CLI commands live? After some searches in the Superset codebase, it's clear they live in the <code>superset/cli.py</code> file. The CLI commands listed above map to function definitions. For example, here's the function definition for <code>superset init</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nv">@superset</span><span class="p">.</span><span class="n">command</span><span class="p">()</span><span class="w"></span>
<span class="nv">@with_appcontext</span><span class="w"></span>
<span class="n">def</span><span class="w"> </span><span class="n">init</span><span class="p">()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="k">None</span><span class="err">:</span><span class="w"></span>
<span class="w"> </span><span class="ss">"""Inits the Superset application"""</span><span class="w"></span>
<span class="w"> </span><span class="n">appbuilder</span><span class="p">.</span><span class="n">add_permissions</span><span class="p">(</span><span class="n">update_perms</span><span class="o">=</span><span class="k">True</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="n">security_manager</span><span class="p">.</span><span class="n">sync_role_definitions</span><span class="p">()</span><span class="w"></span>
</code></pre></div>
<p>It looks like there's no function declaration that maps to the <code>superset db</code> CLI command, but instead the <code>db</code> namespace is imported from another file:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">superset.extensions</span> <span class="kn">import</span> <span class="n">celery_app</span><span class="p">,</span> <span class="n">db</span>
</code></pre></div>
<p>If we jump to <code>superset/extensions.py</code>, we then see:</p>
<div class="highlight"><pre><span></span><code>db = SQLA()
</code></pre></div>
<p>SQLA() sounds like SQLAlchemy, where is it defined or imported?</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">flask_appbuilder</span> <span class="kn">import</span> <span class="n">AppBuilder</span><span class="p">,</span> <span class="n">SQLA</span>
</code></pre></div>
<p>Neat! I know that Superset is built on top of Flask App Builder (or FAB for short), so this must be one of the important touchpoints. We'll avoid continuing down the rabbit hole for now, and dive deeper into FAB another day.</p>
<p>Let's ask the CLI to list out all of the available commands within <code>superset db</code>:</p>
<p><img alt="Superset CLI db" src="/images/superset_cli_db.png"></p>
<p>Neat! Let's run <code>superset db upgrade</code> now. As expected, a bunch of historical database migrations were run and applied.</p>
<p><img alt="Superset db upgrade" src="/images/superset_db_upgrade.png"></p>
<h3>Where does the metadata database live?</h3>
<p>Apparently, <em>some</em> database somewhere was upgraded. But where does that database actually live? After some exploring online, it seems that by default this database resides as a single SQLite database file over in my home directory:</p>
<div class="highlight"><pre><span></span><code>cat ~/.superset/superset.db
</code></pre></div>
<p>Running this command returns a long list of all the schema definitions. This is cool! I look forward to understanding the schemas later.</p>
<h3>Creating default roles</h3>
<p>Next up, we need to create an admin user in our metadata database (fancy word for our little SQLite database!):</p>
<div class="highlight"><pre><span></span><code>superset fab create-admin
</code></pre></div>
<p>Before we run the full command, what CLI commands are available within the <code>superset fab</code> namespace?</p>
<p><img alt="Superset CLI fab" src="/images/superset_cli_fab.png"></p>
<p>The commands here let us create admin users, create regular users, create database objects, reset a user's password, and more. Let's create an admin user by running <code>superset fab create-admin</code>. To keep this simple during exploration, I just answered <strong>admin</strong> for every line in the wizard:</p>
<p><img alt="Superset fab create-admin" src="/images/fab_create_admin.png"></p>
<p>We now have an admin username (<strong>admin</strong>) and password (<strong>admin</strong>) combination for logging in to Superset, when the time is right. Next, let's create the rest of the roles and permissions:</p>
<div class="highlight"><pre><span></span><code>superset init
</code></pre></div>
<p>It's interesting that this command isn't part of the <code>superset fab</code> command list.</p>
<h3>Example Data</h3>
<p>Let's load up the example datasets and dashboards, many of which were actually created by yours truly!</p>
<div class="highlight"><pre><span></span><code><span class="n">superset</span> <span class="nb">load</span><span class="o">-</span><span class="n">examples</span>
</code></pre></div>
<p>What all is loaded? How does this actually work? For fun, let's dive into the functions & relevant codepaths. Let's start with the function definition for <code>superset load-examples</code>. To follow Pythonic syntax, we need to instead look for <code>load_examples()</code> in <code>superset/cli.py</code>. Here's the function declaration:</p>
<div class="highlight"><pre><span></span><code><span class="nd">@with_appcontext</span>
<span class="nd">@superset</span><span class="o">.</span><span class="n">command</span><span class="p">()</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"--load-test-data"</span><span class="p">,</span> <span class="s2">"-t"</span><span class="p">,</span> <span class="n">is_flag</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"Load additional test data"</span><span class="p">)</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">"--load-big-data"</span><span class="p">,</span> <span class="s2">"-b"</span><span class="p">,</span> <span class="n">is_flag</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"Load additional big data"</span><span class="p">)</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">option</span><span class="p">(</span>
<span class="s2">"--only-metadata"</span><span class="p">,</span> <span class="s2">"-m"</span><span class="p">,</span> <span class="n">is_flag</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"Only load metadata, skip actual data"</span><span class="p">,</span>
<span class="p">)</span>
<span class="nd">@click</span><span class="o">.</span><span class="n">option</span><span class="p">(</span>
<span class="s2">"--force"</span><span class="p">,</span> <span class="s2">"-f"</span><span class="p">,</span> <span class="n">is_flag</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s2">"Force load data even if table already exists"</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">load_examples</span><span class="p">(</span>
<span class="n">load_test_data</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
<span class="n">load_big_data</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span>
<span class="n">only_metadata</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
<span class="n">force</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-></span> <span class="kc">None</span><span class="p">:</span>
<span class="sd">"""Loads a set of Slices and Dashboards and a supporting dataset"""</span>
<span class="n">load_examples_run</span><span class="p">(</span><span class="n">load_test_data</span><span class="p">,</span> <span class="n">load_big_data</span><span class="p">,</span> <span class="n">only_metadata</span><span class="p">,</span> <span class="n">force</span><span class="p">)</span>
</code></pre></div>
<p>While most of the code focuses on the possible CLI options & function parameters, the actual function definition is a single line:</p>
<div class="highlight"><pre><span></span><code><span class="n">load_examples_run</span><span class="p">(</span><span class="n">load_test_data</span><span class="p">,</span> <span class="n">load_big_data</span><span class="p">,</span> <span class="n">only_metadata</span><span class="p">,</span> <span class="n">force</span><span class="p">)</span>
</code></pre></div>
<p>If we jump to that function declaration, it's much much longer. This must be where the meat of the logic is for loading examples. Here's a screenshot of just the first half!</p>
<p><img alt="Load Examples Run" src="/images/load_examples_run.png"></p>
<p>This line looks interesting:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">superset</span> <span class="kn">import</span> <span class="n">examples</span>
</code></pre></div>
<p>If I poke through the file structure for Superset, I find a folder dedicated to examples (<code>superset/examples</code>). The <code>__init__.py</code> file for this folder defines each function mapping:</p>
<p><img alt="Examples Directory" src="/images/examples_directory.png"></p>
<p>Cool! What should I look at next?</p>
<div class="highlight"><pre><span></span><code><span class="n">examples</span><span class="o">.</span><span class="n">load_css_templates</span><span class="p">()</span>
</code></pre></div>
<p>Superset ships with two default CSS templates for dashboards, so this code is likely how that data is loaded. Let's crack open the <code>def load_css_tesmplates()</code> function, which lives in <code>superset/examples/load_css_templates.py</code>.</p>
<p><img alt="Load CSS Templates" src="/images/load_css_templates.png"></p>
<p>Each CSS template is loaded one after another. Let's step through the key parts of the code to better understand it.</p>
<div class="highlight"><pre><span></span><code>obj = db.session.query(CssTemplate).filter_by(template_name="Flat").first()
</code></pre></div>
<p>Here we see the <code>db</code> object again, from earlier. Unsurprisingly, there's a matching import statement for it:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">superset</span> <span class="kn">import</span> <span class="n">db</span>
<span class="kn">from</span> <span class="nn">superset.models.core</span> <span class="kn">import</span> <span class="n">CssTemplate</span>
</code></pre></div>
<p>The CssTemplate data model itself looks very simple, as defined in <code>superset/models/core.py</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="n">CssTemplate</span>(<span class="n">Model</span>, <span class="n">AuditMixinNullable</span>):
<span class="s">"""CSS templates for dashboards"""</span>
<span class="n">__tablename__</span> = <span class="s">"css_templates"</span>
<span class="nb">id</span> = <span class="n">Column</span>(<span class="n">Integer</span>, <span class="n">primary_key</span>=<span class="nb">True</span>)
<span class="n">template_name</span> = <span class="n">Column</span>(<span class="n">String</span>(<span class="mi">250</span>))
<span class="n">css</span> = <span class="n">Column</span>(<span class="n">Text</span>, <span class="k">default</span>=<span class="s">""</span>)
</code></pre></div>
<p>As a mental note to myself, this table is named <strong>css_templates</strong> in the metadata database.</p>
<p>The rest of the code <em>smells</em> a lot like SQLAlchemy syntax:</p>
<div class="highlight"><pre><span></span><code>db.session.query(CssTemplate).filter_by(template_name="Flat").first()
</code></pre></div>
<p>While I'm not too familiar with the Superset data model yet, this code likely:</p>
<ul>
<li>Attaches to a SQLAlchemy session / transaction</li>
<li>Queries the metadata database, searching for a matching CssTemplate object with the name <strong>Flat</strong></li>
<li>And the <code>first()</code> at the end is probably just for good measure, in case there are duplicate results</li>
</ul>
<p>The goal likely here is to search for an existing entry in the metadata database for the <strong>Flat</strong> CSS template. If an existing entry in the metadata database wasn't found, a new CssTemplate object is instantiated for the purpose of inserting later:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span> <span class="nv">not</span> <span class="nv">obj</span>:
<span class="nv">obj</span> <span class="o">=</span> <span class="nv">CssTemplate</span><span class="ss">(</span><span class="nv">template_name</span><span class="o">=</span><span class="s2">"</span><span class="s">Flat</span><span class="s2">"</span><span class="ss">)</span>
</code></pre></div>
<p>Then, the CSS itself is defined as a hard-coded string (shortened extensively below):</p>
<div class="highlight"><pre><span></span><code><span class="nt">css</span> <span class="o">=</span> <span class="nt">textwrap</span><span class="p">.</span><span class="nc">dedent</span><span class="o">(</span>
<span class="s2">"""\</span>
<span class="s2"> .navbar {</span>
<span class="s2"> transition: opacity 0.5s ease;</span>
<span class="s2"> opacity: 0.05;</span>
<span class="s2"> }</span>
<span class="s2"> ....</span>
<span class="s2"> """</span>
<span class="o">)</span>
</code></pre></div>
<p>Finally, the string is set to the instianted CssTemplate object's <code>css</code> column and inserted into the metadata database:</p>
<div class="highlight"><pre><span></span><code>obj.css = css
db.session.merge(obj)
db.session.commit()
</code></pre></div>
<p>This whole process is then repeated to add the <strong>Courier Black</strong> CSS template.</p>
<p>Phew! This was just the CSS templates. No example datasets or example dashboards yet. Because I'm running out of time today, I'll circle back to the code paths for those a later day.</p>
<h3>Starting Flask Server</h3>
<p>The last step now is to fire up the Flask server and see how Superset looks in the web browser.</p>
<div class="highlight"><pre><span></span><code><span class="n">FLASK_ENV</span><span class="o">=</span><span class="n">development</span> <span class="n">superset</span> <span class="n">run</span> <span class="o">-</span><span class="n">p</span> <span class="mi">8088</span> <span class="o">--</span><span class="n">with</span><span class="o">-</span><span class="n">threads</span> <span class="o">--</span><span class="n">reload</span> <span class="o">--</span><span class="n">debugger</span>
</code></pre></div>
<p>By default, Flask will run on port 8088 but we can change the port number by changing the value we put in the invocation.</p>
<p><img alt="Flask Server" src="/images/flask_server.png"></p>
<p>We're shown a somewhat incomplete and outdated login screen. This is interesting.</p>
<p><img alt="Superset Login" src="/images/superset_login.png"></p>
<p>My guess here is that somewhere, the frontend assets need to be built. This seems to align with the comments listed before the flask server initialization instructions:</p>
<div class="highlight"><pre><span></span><code># <span class="nv">Start</span> <span class="nv">the</span> <span class="nv">Flask</span> <span class="nv">dev</span> <span class="nv">web</span> <span class="nv">server</span> <span class="nv">from</span> <span class="nv">inside</span> <span class="nv">your</span> <span class="nv">virtualenv</span>.
# <span class="nv">Note</span> <span class="nv">that</span> <span class="nv">your</span> <span class="nv">page</span> <span class="nv">may</span> <span class="nv">not</span> <span class="nv">have</span> <span class="nv">CSS</span> <span class="nv">at</span> <span class="nv">this</span> <span class="nv">point</span>.
# <span class="nv">See</span> <span class="nv">instructions</span> <span class="nv">below</span> <span class="nv">how</span> <span class="nv">to</span> <span class="nv">build</span> <span class="nv">the</span> <span class="nv">front</span><span class="o">-</span><span class="k">end</span> <span class="nv">assets</span>.
</code></pre></div>
<p>Let's save frontend for Day 3!</p>Apache Superset from Scratch: Day 1 (Python Setup)2021-12-23T10:20:00-05:002021-12-23T10:20:00-05:00Srini Kadamatitag:None,2021-12-23:/apache-superset-from-scratch-day-1-python-setup.html<p>I'm on a quest, to understand and map out as much of the <a href="https://superset.apache.org/">Apache Superset</a> code base as I can. In my <a href="https://linkedin.com/in/srinivasakadamati">day job</a>, I have the opportunity to <em>use</em> Superset on a daily basis but I'm not intimately familiar with the code paths themselves. This series will revolve around …</p><p>I'm on a quest, to understand and map out as much of the <a href="https://superset.apache.org/">Apache Superset</a> code base as I can. In my <a href="https://linkedin.com/in/srinivasakadamati">day job</a>, I have the opportunity to <em>use</em> Superset on a daily basis but I'm not intimately familiar with the code paths themselves. This series will revolve around the process on a M1 Macbook Air, but should generalize to most *nix systems.</p>
<p>My goal is to make noticeable progress on a daily basis. With the preamble out of the way, let's start!</p>
<h3>Contributing.md</h3>
<p>The Superset codebase is large; where does one even begin? For new code bases, I generally like alternating between:</p>
<ul>
<li><em>breadth</em>: starting with an overview of the development / contributor's guide</li>
<li><em>depth</em>: recursively going through each component & sub-component</li>
</ul>
<p>For breadth, I'll start with the <a href="https://github.com/apache/superset/blob/master/CONTRIBUTING.md#setup-local-environment-for-development">Setup Local Environment for Development</a> section from CONTRIBUTING.MD.</p>
<h3>Python 3.8</h3>
<p>Python 3.7.x or 3.8.x are recommended for running the Superset backend. I'm on a Mac, and prefer to leave the default <code>python</code> that ships with the operating system to 2.7.x. Instead, I'll use <a href="https://brew.sh/">Homebrew</a> to install Python 3.8:</p>
<div class="highlight"><pre><span></span><code><span class="n">brew</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="n">python</span><span class="mf">@3.8</span><span class="w"></span>
</code></pre></div>
<p>Now, both the <code>python3</code> and <code>pip3</code> commands work as expected (independent of the <code>python</code> and <code>pip</code> commands)!</p>
<ul>
<li><code>python3 --version</code> returns<ul>
<li><code>Python 3.8.12</code></li>
</ul>
</li>
<li><code>pip3 --version</code> returns<ul>
<li><code>pip 21.2.4 from /opt/homebrew/lib/python3.8/site-packages/pip (python 3.8)</code></li>
</ul>
</li>
</ul>
<h3>Virtualenv</h3>
<p>Now time to create a Python virtual environment. Virtual environment is really a sandbox for your Python libraries that lives within a specific folder / project. This workflow gives you a few benefits:</p>
<ul>
<li>Virtual environment lives completely independent of the global Python sandbox</li>
<li>It's super quick and easy to delete all of the project specific Python libraries and re-install, as an escape hatch</li>
<li>Less time wasted (not zero sadly) dealing with version / dependency conflicts</li>
</ul>
<p>Are there any downsides?</p>
<ul>
<li>The main one is increased storage requirements, because every Python project on your computer has its own copies of similar libraries</li>
</ul>
<p>First, let me install <code>virtualenv</code>:</p>
<div class="highlight"><pre><span></span><code>pip3 install virtualenv
</code></pre></div>
<p>Next, let's give our virtual environment a name. The <code>virtualenv</code> creates a folder within your project folder and stuffs all of the Python libraries you install there. So we're really trying to decide on the <em>name</em> of this folder.</p>
<p>The CONTRIBUTING.MD file in the Superset repo suggests naming it <code>venv</code>:</p>
<div class="highlight"><pre><span></span><code>python3 -m venv venv
</code></pre></div>
<ul>
<li>The first <code>venv</code> is short-hand for <code>virtualenv</code></li>
<li>The second <code>venv</code> refers to the name of the folder we're creating (<code>../superset/venv/</code>)</li>
</ul>
<p>Why should we name it <code>venv/</code>? One hint is in the <code>.gitignore</code> file, which <a href="https://git-scm.com/docs/gitignore">specifies files & folder paths to ignore in version control</a>. This means that each user can have their own local state and those details won't get checked into version control. </p>
<p>The <code>.gitignore</code> file itself <em>is</em> version controlled though. So this file provides a "universal" agreemenet between all of the contributors to Superset that these files should not be checked into version control. Let's search for any string values containing "env" in the <code>.gitignore</code>:</p>
<div class="highlight"><pre><span></span><code>cat .gitignore | grep 'env'
</code></pre></div>
<p>This returns:</p>
<div class="highlight"><pre><span></span><code><span class="na">.env</span>
<span class="na">.envrc</span>
<span class="nf">env</span>
<span class="nf">venv</span><span class="p">*</span>
<span class="nf">env_py3</span>
<span class="nf">envpy3</span>
<span class="nf">env36</span>
<span class="nf">venv</span>
</code></pre></div>
<p>While some open source projects use the <code>.venv/</code> convention for virtualenv, the Superset one uses <code>venv</code> it seems. So this means:</p>
<ul>
<li>we can party in our local <code>venv/</code> and none of those changes will make it into any code PR's we may want to make</li>
<li>if we want to use <code>.venv/</code> instead, the git version control system will detect a change</li>
</ul>
<p>Let's stick to the community convention, and run the suggested command:</p>
<div class="highlight"><pre><span></span><code>python3 -m venv venv
</code></pre></div>
<p>If we run <code>ls</code> while within the <code>superset/</code> folder, we'll see <code>venv</code> listed as a folder. Success!</p>
<h3>Python Dependencies</h3>
<p>Usually, the Python requirements are specified in a <code>requirements.txt</code> file. In the case of Superset, we're blessed with a folder of <code>.in</code> and <code>.txt</code> files. There's a lot we could explore and unpack here, but I'm going to focus on getting everything setup first.</p>
<p>If we look to CONTRIBUTING.MD, we see:</p>
<div class="highlight"><pre><span></span><code>pip install -r requirements/testing.txt
</code></pre></div>
<p>If we open that file, we see something that resembles a standard <code>requirements.txt</code> file, but with this header:</p>
<div class="highlight"><pre><span></span><code># This file is autogenerated by pip-compile-multi
</code></pre></div>
<p>I've made a mental note to investigate & explore <code>pip-compile-multi</code> later, a library for compiling multiple requirement files. For now, let's run the following command to install the dependencies:</p>
<div class="highlight"><pre><span></span><code>pip3 install -r requirements/testing.txt
</code></pre></div>
<p><strong>Error 1: MySQL</strong></p>
<p>I ran into this issue with red scary error text while on my M1 Macbook computer:</p>
<div class="highlight"><pre><span></span><code><span class="n">Collecting</span> <span class="n">mysqlclient</span><span class="o">==</span><span class="mf">2.1.0</span>
<span class="n">Using</span> <span class="n">cached</span> <span class="n">mysqlclient</span><span class="o">-</span><span class="mf">2.1.0</span><span class="o">.</span><span class="n">tar</span><span class="o">.</span><span class="n">gz</span> <span class="p">(</span><span class="mi">87</span> <span class="n">kB</span><span class="p">)</span>
<span class="n">ERROR</span><span class="p">:</span> <span class="n">Command</span> <span class="n">errored</span> <span class="n">out</span> <span class="k">with</span> <span class="n">exit</span> <span class="n">status</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">command</span><span class="p">:</span> <span class="o">/</span><span class="n">opt</span><span class="o">/</span><span class="n">homebrew</span><span class="o">/</span><span class="n">opt</span><span class="o">/</span><span class="n">python</span><span class="o">@</span><span class="mf">3.8</span><span class="o">/</span><span class="nb">bin</span><span class="o">/</span><span class="n">python3</span><span class="mf">.8</span> <span class="o">-</span><span class="n">c</span> <span class="s1">'import io, os, sys, setuptools, tokenize; sys.argv[0] = '</span><span class="s2">"'"</span><span class="s1">'/private/var/folders/6d/f0fzvlyn6sd58q5rmx6s6df00000gn/T/pip-install-6c548wua/mysqlclient_a8c054d3233d4d00acb42d6a6bf2a562/setup.py'</span><span class="s2">"'"</span><span class="s1">'; __file__='</span><span class="s2">"'"</span><span class="s1">'/private/var/folders/6d/f0fzvlyn6sd58q5rmx6s6df00000gn/T/pip-install-6c548wua/mysqlclient_a8c054d3233d4d00acb42d6a6bf2a562/setup.py'</span><span class="s2">"'"</span><span class="s1">';f = getattr(tokenize, '</span><span class="s2">"'"</span><span class="s1">'open'</span><span class="s2">"'"</span><span class="s1">', open)(__file__) if os.path.exists(__file__) else io.StringIO('</span><span class="s2">"'"</span><span class="s1">'from setuptools import setup; setup()'</span><span class="s2">"'"</span><span class="s1">');code = f.read().replace('</span><span class="s2">"'"</span><span class="s1">'</span><span class="se">\r\n</span><span class="s1">'</span><span class="s2">"'"</span><span class="s1">', '</span><span class="s2">"'"</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="s2">"'"</span><span class="s1">');f.close();exec(compile(code, __file__, '</span><span class="s2">"'"</span><span class="s1">'exec'</span><span class="s2">"'"</span><span class="s1">'))'</span> <span class="n">egg_info</span> <span class="o">--</span><span class="n">egg</span><span class="o">-</span><span class="n">base</span> <span class="o">/</span><span class="n">private</span><span class="o">/</span><span class="n">var</span><span class="o">/</span><span class="n">folders</span><span class="o">/</span><span class="mi">6</span><span class="n">d</span><span class="o">/</span><span class="n">f0fzvlyn6sd58q5rmx6s6df00000gn</span><span class="o">/</span><span class="n">T</span><span class="o">/</span><span class="n">pip</span><span class="o">-</span><span class="n">pip</span><span class="o">-</span><span class="n">egg</span><span class="o">-</span><span class="n">info</span><span class="o">-</span><span class="mi">0735</span><span class="n">tk4h</span>
<span class="n">WARNING</span><span class="p">:</span> <span class="n">Discarding</span> <span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">files</span><span class="o">.</span><span class="n">pythonhosted</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">packages</span><span class="o">/</span><span class="n">de</span><span class="o">/</span><span class="mi">79</span><span class="o">/</span><span class="n">d02be3cb942afda6c99ca207858847572e38146eb73a7c4bfe3bdf154626</span><span class="o">/</span><span class="n">mysqlclient</span><span class="o">-</span><span class="mf">2.1.0</span><span class="o">.</span><span class="n">tar</span><span class="o">.</span><span class="n">gz</span><span class="c1">#sha256=973235686f1b720536d417bf0a0d39b4ab3d5086b2b6ad5e6752393428c02b12 (from https://pypi.org/simple/mysqlclient/) (requires-python:>=3.5). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.</span>
<span class="n">ERROR</span><span class="p">:</span> <span class="n">Could</span> <span class="ow">not</span> <span class="n">find</span> <span class="n">a</span> <span class="n">version</span> <span class="n">that</span> <span class="n">satisfies</span> <span class="n">the</span> <span class="n">requirement</span> <span class="n">mysqlclient</span><span class="o">==</span><span class="mf">2.1.0</span> <span class="p">(</span><span class="kn">from</span> <span class="nn">versions</span><span class="p">:</span> <span class="mf">1.3.0</span><span class="p">,</span> <span class="mf">1.3.1</span><span class="p">,</span> <span class="mf">1.3.2</span><span class="p">,</span> <span class="mf">1.3.3</span><span class="p">,</span> <span class="mf">1.3.4</span><span class="p">,</span> <span class="mf">1.3.5</span><span class="p">,</span> <span class="mf">1.3.6</span><span class="p">,</span> <span class="mf">1.3.7</span><span class="p">,</span> <span class="mf">1.3.8</span><span class="p">,</span> <span class="mf">1.3.9</span><span class="p">,</span> <span class="mf">1.3.10</span><span class="p">,</span> <span class="mf">1.3.11</span><span class="n">rc1</span><span class="p">,</span> <span class="mf">1.3.11</span><span class="p">,</span> <span class="mf">1.3.12</span><span class="p">,</span> <span class="mf">1.3.13</span><span class="p">,</span> <span class="mf">1.3.14</span><span class="p">,</span> <span class="mf">1.4.0</span><span class="n">rc1</span><span class="p">,</span> <span class="mf">1.4.0</span><span class="n">rc2</span><span class="p">,</span> <span class="mf">1.4.0</span><span class="n">rc3</span><span class="p">,</span> <span class="mf">1.4.0</span><span class="p">,</span> <span class="mf">1.4.1</span><span class="p">,</span> <span class="mf">1.4.2</span><span class="p">,</span> <span class="mf">1.4.2</span><span class="o">.</span><span class="n">post1</span><span class="p">,</span> <span class="mf">1.4.3</span><span class="p">,</span> <span class="mf">1.4.4</span><span class="p">,</span> <span class="mf">1.4.5</span><span class="p">,</span> <span class="mf">1.4.6</span><span class="p">,</span> <span class="mf">2.0.0</span><span class="p">,</span> <span class="mf">2.0.1</span><span class="p">,</span> <span class="mf">2.0.2</span><span class="p">,</span> <span class="mf">2.0.3</span><span class="p">,</span> <span class="mf">2.1.0</span><span class="n">rc1</span><span class="p">,</span> <span class="mf">2.1.0</span><span class="p">)</span>
<span class="n">ERROR</span><span class="p">:</span> <span class="n">No</span> <span class="n">matching</span> <span class="n">distribution</span> <span class="n">found</span> <span class="k">for</span> <span class="n">mysqlclient</span><span class="o">==</span><span class="mf">2.1.0</span>
</code></pre></div>
<p>Some <a href="https://stackoverflow.com/questions/66669728/trouble-installing-mysql-client-on-mac">StackOverflow sleuthing suggested</a> that I needed to install MySQL server via homebrew so the installation process for the Python client library would work. So this may not be an M1 related issue after all:</p>
<div class="highlight"><pre><span></span><code>brew install mysql
</code></pre></div>
<p><strong>Error 2: Postgres</strong></p>
<p>While <code>mysql-client</code> succeeded, pip now got stuck on postgres:</p>
<div class="highlight"><pre><span></span><code><span class="n">Error</span><span class="o">:</span> <span class="n">pg_config</span> <span class="n">executable</span> <span class="n">not</span> <span class="n">found</span><span class="o">.</span>
<span class="n">pg_config</span> <span class="k">is</span> <span class="n">required</span> <span class="n">to</span> <span class="n">build</span> <span class="n">psycopg2</span> <span class="n">from</span> <span class="n">source</span><span class="o">.</span> <span class="n">Please</span> <span class="n">add</span> <span class="n">the</span> <span class="n">directory</span>
<span class="n">containing</span> <span class="n">pg_config</span> <span class="n">to</span> <span class="n">the</span> <span class="n">$PATH</span> <span class="n">or</span> <span class="n">specify</span> <span class="n">the</span> <span class="n">full</span> <span class="n">executable</span> <span class="n">path</span> <span class="k">with</span> <span class="n">the</span>
<span class="n">option</span><span class="o">:</span>
<span class="n">python</span> <span class="n">setup</span><span class="o">.</span><span class="na">py</span> <span class="n">build_ext</span> <span class="o">--</span><span class="n">pg</span><span class="o">-</span><span class="n">config</span> <span class="sr">/path/to/</span><span class="n">pg_config</span> <span class="n">build</span> <span class="o">...</span>
<span class="n">or</span> <span class="k">with</span> <span class="n">the</span> <span class="n">pg_config</span> <span class="n">option</span> <span class="k">in</span> <span class="s1">'setup.cfg'</span><span class="o">.</span>
<span class="n">If</span> <span class="n">you</span> <span class="n">prefer</span> <span class="n">to</span> <span class="n">avoid</span> <span class="n">building</span> <span class="n">psycopg2</span> <span class="n">from</span> <span class="n">source</span><span class="o">,</span> <span class="n">please</span> <span class="n">install</span> <span class="n">the</span> <span class="n">PyPI</span>
<span class="s1">'psycopg2-binary'</span> <span class="k">package</span> <span class="nn">instead.</span>
</code></pre></div>
<p>Let's check out <a href="https://stackoverflow.com/questions/20170895/mac-virtualenv-pip-postgresql-error-pg-config-executable-not-found">Stack Overflow again</a>. I like using the <a href="https://postgresapp.com/">Postgres Mac app</a>, which contains a <code>pg_config</code> executable. So I'm going to </p>
<p>I'm going to move forward with finding the path to the <code>pg_config</code> file and add it to my PATH. I'll first crack open the Postgres.app folder:</p>
<p><img alt="Opening Postgres.app Folder" src="/images/app_show_package_contents.png"></p>
<p>After jumping through folders, I found the <code>pg_config</code> executable. As suggested in StackOverflow, I'm going to add that executable's folder to my PATH:</p>
<div class="highlight"><pre><span></span><code><span class="k">export</span> <span class="n">PATH</span><span class="o">=$</span><span class="n">PATH</span><span class="p">:</span><span class="o">/</span><span class="n">Applications</span><span class="o">/</span><span class="n">Postgres</span><span class="o">.</span><span class="n">app</span><span class="o">/</span><span class="n">Contents</span><span class="o">/</span><span class="n">Versions</span><span class="o">/</span><span class="mi">14</span><span class="o">/</span><span class="n">bin</span>
</code></pre></div>
<p>Now when I <code>pip3 install -r requirements/testing.txt</code> again, everything works beautifully!</p>
<h3>Editable Superset</h3>
<p>Now, we're ready to install Superset in "editable" mode. Editable mode lets us modify and test code changes in Superset quickly, which is ideal when developing features or fixing bugs.</p>
<div class="highlight"><pre><span></span><code>pip3 install -e .
</code></pre></div>
<p>To test the installation, run the <code>superset</code> command and the Superset CLI should appear:</p>
<p><img alt="Superset CLI" src="/images/superset_cli2.png"></p>
<h3>Next Up</h3>
<p>That's it for Day 1. In Day 2, I'll play with setting up the metadata database, creating roles & permissions, loading example data, and starting the backend server. </p>
<p>If you want to follow along, use the <a href="/feeds/all.atom.xml">RSS feed</a>. Stay tuned! 📺</p>