Projects¶
This section covers documentation for specific projects supported by Deployer.
JupyterHub¶
TACC’s integrated JupyterHub project extends the community supported JupyterHub with customizations that enable deeper integration and ease of use of TACC resources from directly within running notebook servers.
The main features provided by the TACC JupyterHub are:
- Individual Jupyterhub instance for each project with customized notebook server image based on the research interests of the community.
- Dedicated notebook server running in a Docker container for each user.
- Persistent storage volumes backed by large-scale storage at TACC and mounted directly to the “permanent” directory within each notebook container.
- Additional volume mounts, customizable through configuration, to provide POSIX interfaces to additional file collections stored on TACC systems.
- Integrated authentication with the TACC OAuth server or another tenant OAuth server.
- TACC CLI, agavepy, and other libraries pre-installed in the user’s notebook server for integrating with other TACC Cloud services.
The CIC group at TACC maintains the core TACC JupyterHub code and automated deployment management in Deployer. The notebook server images are developed by a number of individuals across multiple groups at TACC as well as through contributions by external partners.
Primary Server Requirements and Options¶
The TACC JupyterHub uses Docker Swarm 1.11 to deploy notebook servers as Docker containers on a cluster. It requires a minimum of two servers meeting the following requirements:
- One server with the
hub
role. This server will run the main JupyterHub application and therefore will require ports 80 and 443 open to the public internet. - One or more servers with the
worker
role. These servers will run the user notebook containers. - Ports must be opened between manager and workers to enable communication. For each server, the optional
private_ip
attribute may be specified. If is it, communication will happen on this IP. Otherwise, it will happen onssh_host
. If firewalld and/or iptables are running on the hosts, make sure that both the local IP and the 172.17.0.0 subnets (the Docker0 bridge) are opened.
In addition to a Docker 1.11 Swarm cluster, an NFS management cluster will be constructed from the hub
node (nfs server)
to all worker
nodes (nfs clients). This NFS management cluster is required as it is used to persist login data from
the hub to the workers.
Primary Configuration Requirements and Options¶
The following configuration properties are required.
tenant
- The tenant id this JupyterHub belongs to. Determines the OAuth server to use.jupyterhub_tenant_base_url
- The base URL for the OAuth server. Note: this property is only required if not using a standard TACC tenant; for standard TACC tenants, Deployer will automatically derive the base URL.jupyterhub_host
- The domain to serve JupyterHub on (without the protocol); e.g. “jupyter.designsafe-ci.org”.jupyterhub_oauth_client_id
- The ID for the OAuth client that JupyterHub will use to authenticate users. The JupyterHub requires a valid OAuth client for the tenant being used, and this OAuth client must be registered with the requiredcallbackUrl
for the hub, which has the formhttps://<jupyterhub_host>/hub/oauth_callback
. See Additional Considerations below for more details.jupyterhub_oauth_client_secret
- The corresponding client secret for the OAuth client. Note: it is recommended that thejupyterhub_oauth_client_secret
be stored in a.passwords
file.
The following configuration properties are optional, though some are strongly encouraged; see below.
jupyterhub_cert
andjupyterhub_key
- As strings, the contents of an SSL certificate and key for the domain specified injupyterhub_host
. If these attributes are not specified, self-signed certificates will be supplied. This will result in insecure warnings in the browser for users.
volume_mounts
- A list of directories to mount into every user notebook container. Each mount should be a string in the format<host_path>:<container_path>:<mode>
wherehost_path
andcontainer_path
are absolute paths andmode
is one ofro
(read only), orrw
(read-write). Also, the paths variables recognize the following template variables:{username}
- the username of the logged in user.{tenant_id}
- the tenant id for the JupyterHub.{tas_homeDirectory}
- the home directory for the logged in user in TAS (TACC Accounting System). This template variable can only be used in tenants using the TACC identity provider (see Integrating with StockYard below).
jupyterhub_image
- Docker image to use for central JupyterHub program. Will default to deploying the latest stable version. Image must be available on the public Docker Hub.jupyter_user_image
- Docker image to use for user notebook servers. If not specified, the latest stable version oftaccsciapps/jupyteruser-base
will be used.jupyteruser_uid
andjupyteruser_gid
- The uid and gid to run the user notebook servers as; if not specified, the uid and gid created in the Docker image specified byjupyterhub_image
will be used unless integration with TAS is enabled (see Integrating with TAS below).
Additional Considerations¶
We highlight some additional considerations regarding JupyterHub configuration.
- While
volume_mounts
is technically optional, at least one mount is needed to provide persistent user data; we generally recommend mounting a user directory on the host (e.g./path/to/nfs/share/{username}
to a path such as/home/juupyter/my_data
inside the container. - Every
host_path
in thevolume_mounts
parameter must exist on all worker nodes (for example, via an NFS share) or container execution will fail. Unless using the NFS management cluster, Deployer assumes these directories have been created already. - For security purposes, ensure that the
oauth_client_secret
is in the.passwords
file - To generate an OAuth client key and secret for your JupyterHub instance, use a command like the following:
$ curl -u <service_account>:<service_account_password> \
-d "clientName=<you_pick>&callbackUrl=https://<jupyterhub_host>/hub/oauth_callback" \
https://<tenant_base_url>/clients/v2
Integrating with TAS¶
If the tenant being used for JupyterHub leverages the TACC identity provider (i.e., ldap.tacc.utexas.edu) then JupyterHub can integrate with TAS (the TACC Accounting System) to enable individual notebook servers to run as the uid and gid of the logged in user. This feature provides the advantages of “vertical single sign-on”. i.e., files created and/or updated by the user in the notebook server will have the correct ownership properties. Integrating with TAS is required for integrating with Stockyard, TACC’s gloabl file system. (see Integrating with Stockyard below).
Integrating with TAS also requires the following configurations:
use_tas_uid
anduse_tas_gid
- Setting totrue
instructs JupyterHub to launch the user’s notebook with the uid and gid for the user in TAS.tas_role_account
andtas_role_pass
- an account and password in TAS for Jupyterhub to use to make TAS API calls.
Integrating with StockYard¶
JupyterHub instances can integrate with TACC’s global file system, Stockyard, if TAS integration has been enabled. This option is only available to approved JupyterHub instances deployed within the secured TACC network.
In order to provide file system mounts from Stockyard into user notebook servers, a Lustre mount to Stockyard must be made on all worker nodes.
Once the Lustre mounts have been created on the worker nodes, the only configuration required is to add notebook
container mounts to Stockyard using the volume_mounts
parameter. For example, if Stockyard is mounted at /work
on all worker nodes, creating a mount with the following config
/work/{tas_homeDirectory}:/home/jupyter/tacc-work:rw
would mount the user’s $WORK directory at /home/jupyter/tacc-work
in read-write mode within the container.
Support for Project Mounts (DesignSafe)¶
As an example of custom functionality that be can be added for specific JupyterHub instances, we describe the support for project mounts in the DesignSafe tenant.
Coming soon…
Abaco API¶
Coming soon…
Agave API¶
TACC’s Agave API provides services for enabling computational experiments on HPC and HTC resources. For more information on the API, see TACC’s official documentation.
The Agave API is currently organized into frontend services and worker components. These agents interact through a persistence layer comprised of a Mongo and MySQL database and a Beanstalk and RabbitMQ queue. All Agave components are packaged into Docker images and configured through environment variables.
In addition to the persistence layer, the Agave services are secured with JWT authentication and some authorization aspects through claims. The services are built to integrate with the TACC OAuth Gateway, though in theory, any API Gateway capable of generating a conformant JWT could work.
Minimal Quickstart¶
Agave can be deployed in a “sandbox” configuration for evaluation and/or prototyping purposes with minimal configuration. When using this setup, Deployer installs and runs all required Agave databases and automatically wires up the Agave services to those databases.
Note
The minimal quickstart does not include settings for changing the default database passwords, so it is insecure and should not be used in production!
The minimal quickstart required three servers (or VMs) - one for the OAuth Gateway, one for the Agave components, and a third for the persistence layer. A hosts file and a config file will be required. The following samples can be used as a starting point for deploying a complete sandbox, including all AGave components and databases and the OAuth Gateway:
Hosts File Sample¶
ssh_user: root
ssh_key: my_ssh.key
oauth.sandbox.dev:
hosts:
- host_id: oauth-sbox
ssh_host: <IP_1>
roles:
- oauth
- host_id: oauth-dbs-sbox
ssh_host: <IP_3>
roles:
- all_oauth_dbs
agave.sandbox:
hosts:
- host_id: ag-sbox-all
ssh_host: <IP_2>
roles:
- agave_frontends
- agave_workers
- host_id: ag-sbox-dbs
ssh_host: <IP_3>
roles:
- all_agave_dbs
Notes:
- Replace
<IP_1>, <IP_2>, <IP_3>
with actual IPs or hostnames for your servers. Deployer must be able to SSH to these IPs to install the software. - We have used
sanbox
as the instance identifier; this can be changed, as desired. - We have used
dev
as the tenant identifier; just as with the instance identifier, this can be changed.
Configs File Sample¶
oauth.sandbox.dev:
base_url: dev.api.example.com
agave_frontend_host: <IP_2>
use_cic_ldap: True
Notes:
- We have not specified any Agave configs as technically they are not required for a minimal setup. However, without setting at a minimum database password condfigs, the deployment will not be secure.
- The value of
base_url
(dev.api.example.com
in the example above) will be the primary URL for all APIs. This should be changed to a domain owned by the organization. - Using the above config, the OAuth Gateway will be deployed with self-signed certificates. See the OAuth Gateway section for additional configuration options, including deploying with valid certiciates.
A Word on Ports¶
The OAuth Gateway and Agave projects communicate with the various databases on specific ports. Therefore, the database
ports on <IP_3>
must be reachable from <IP_1>
and <IP_2>
. If that is not possible using the ssh_host
value configured in the hosts file, separate configs can be provided to specify the IP to use for each database, e.g.,
agave_mysql_host
– see Service and Host Configs for a complete list.
For example, when using a cloud provider such as OpenStack, it is often possible to assign servers an IP on a private network and for the OAuth Gateway and Agave services to use that IP for communication to the databases.
Deploying the Sandbox¶
Use the Deployer CLI to deploy the sandbox with two commands. First, deploy the Agave project as follows:
$ deployer -p agave -i sandbox -a deploy
Next, deploy the OAuth Gateway project, which requires specifying the tenant in addition to the instance:
$ deployer -p oauth -i sandbox -t dev -a deploy
Note that these commands do not explicitly specify the hosts and configs file to use. Deployer will use the first file
it finds with extension .hosts (respectively, .configs) in the current working directory. If you have multiple
hosts or configs files in the current working directory, specify the correct one using the -s
(respectively, -c
)
flags. See the User Guide for more details.
Service and Host Configs¶
Like other Deployer projects, Agave deployments leverage settings specified on hosts, either through special roles or other attributes, and global settings specified through the configs file. The configs object should contain an instance identifier and any Agave attributes to apply to all services in the instance.
At a minimum, the following global configs must be specified:
agave_mysql_host
- Hostname or IP address for the MySQL database.agave_mysql_user
- MySQL user to use.agave_mysql_password
- Password associated with MySQL user.mysql_root_user
- MySQL root user; used to create schemas, etc.mysql_root_password
- Password associated with MySQL root user.agave_mongo_host
- Hostname or IP address for the MongoDB database.agave_mongo_user
- MongoDB user to use.agave_mongo_password
- Password associated with MongoDB user.mongo_admin_user
- Admin mongo user; used to create collections, indexes, etc.mongo_admin_password
- Admin mongo password.agave_beanstalk_host
- Hostname or IP address for the beanstalk instance.agave_rabbitmq_host
- Hostname or IP address for the RabbitMQ instance.
Note: the database host attribute will be derived automatically if a host in the servers file has the corresponding role,
e.g., agave_mysql
or all_agave_dbs
.
Agave Roles¶
The following roles can be set on a per-host basis to deploy specific components of Agave on a given server.
agave_frontends
- Run a set of Agave Frontend services. Looks for theagave_frontends
attribute to determine which services to run. If that attribute is not defined, all frontend services will be run.agave_workers
- Run a set of Agave Workers. Looks for theagave_workers
attribute to determine which workers to run. If that attribute is not defined, all workers will be run.
OAuth Gateway¶
The TACC OAuth Gateway provides two primary functions: 1) an OAuth2 provider server and 2) an API Gateway and reverse proxy to APIs registered with the server. When a request is made to a registered API using an OAuth access token, the API Gateway function will generate a JWT corresponding to the identity information contained within the token before forwarding the request to the actual service.