CSI: STScI Work Plan
January 13, 2
017
Introduction
The main overarching goal of the community software initiative (CSI) at STScI is to enable a
broad range of scientific inquiry by fostering the Python infrastructure for astronomical data
analysis. Our aim is to serve the broader astronomical community by coordinating its efforts with
the institute mission and AURAsupported projects. This work plan outlines the three directions
in which we will focus our effort, describes the project governance and organization, and details
distribution of work for Year 1. Plans for Year 2 and 3 are sketched out.
To accomplish our main goal, we will focus our efforts in three main directions (or planks):
1. Infrastructure (“Infrastructure and code maintenance” in the language of the STScI
Astronomy Software coordination effort): provide organizational, programming, and
financial support to develop and maintain key pieces of astronomical software
infrastructure. This effort will focus on fundamental libraries and routines that are unique
to astronomy and do not have a user base outside the field (e.g., dealing with celestial
coordinate systems, FITS files, or software packaging tools to simplify sharing of
astronomy codes). These infrastructural building blocks are crucial for all CSI
stakeholders and will have a broad impact on our future capabilities to support scientific
data analysis and reduction software.
2. Emerging efforts (“Requirements, Specifications & Prototypes”): provide infrastructure,
logistical support and guidance for the development of new Python software libraries to
fill missing pieces of the current ecosystem. Following the Astropy model of including the
astronomical community in the development of software, we will identify and nurture
efforts to fill these gaps as well as provide standards for documentation, input/output,
testing, etc. to ensure contributed libraries play well with each other.
3. Community engagement (“Training, Documentation and Community Engagement”):
ensure the above effort is accessible and useful to the whole astronomy community.
This will involve soliciting input on priorities, creating educational materials, and
developing manuals for highlevel science workflows. Through this plank we will serve
the large portion of the astronomical community that does not develop software
independently, but simply needs accessible tools to carry out scientific research. This
plank also includes outreach to institutions like other AURA centers or large
observatories, considered major stakeholders in the astronomy community.
Team Organization
The work of the CSI team will be directed and overseen by a steering committee . The current
members of the steering committee are Lou Strolger (Admin PI), Erik Tollerud (Science PI),
Harry Ferguson, Perry Greenfield, Megan Sosey, Joshua Peek, and Ivelina Momcheva (project
manager).
Ivelina Momcheva will organize and manage the project directly, coordinating with Viviana
Acosta who is her counterpart on the STScI Astronomy Software coordination effort. The
steering committee will meet monthly to review progress, discuss new initiatives, assess
priorities, and make highlevel decisions. Quarterly, the steering committee will summarize the
project work and solicit input from the advisory board. These reports will also be provided to the
DRF oversight committee and their input will be welcome.
Higher level oversight of the priorities for CSI will be done by the advisory board , which will
initially consist of all project coinvestigators and representatives from the AURA centers that
provided letters of support (LSST, NOAO, and DKIST). Hence the initial membership will be the
steering committee, Justin Ely, Jonathan Hargis, Susan Kassin, Pey Lian Lim, Nadia Dencheva,
Steven Berukoff (DKIST), Adam Bolton (NOAO), Frossie Economou (LSST), and Bahram
Mobasher (FIELDS).
For communication purposes, the team will be using Confluence, and for documents that
nonSTScI personnel (i.e., the advisory board) will need to access, Google Docs. Public
announcements will be announced on a dedicated CSI webpage and disseminated through
STScI communication channels.
CSI will maintain contact with other STScI software efforts through two channels: the new STScI
Astronomy Software coordination effort, and direct interactions in SSB. One of the steering
committee members is Harry Ferguson, the project scientist for the coordination effort. The
quarterly meetings will be a specific checkpoint for this effort, although the inclusion of Harry on
the steering committee will ensure his frequent contact with CSI’s efforts. Additionally, CSI will
be making use of SSB staff who are working on related efforts for the missions, and this will
provide direct “ontheground” contact between CSI’s efforts and those of the missions. Finally,
the steering committee itself will be charged with reaching out to other software efforts and
communicating the connection between CSI’s priorities and these other efforts (e.g., other
strategic DRF programs, individual STScI scientist’s efforts, etc).
Year 1 Plan
The Year 1 plan is focused on tasks that will have immediate high impact on both the missions
and the community. The steering committee and advisory board have jointly developed an
extensive priority list for community software STScI is involved in (over 30 distinct software
efforts), and the efforts below are aimed at the items at the top of that priority list, organized by
the plank that they are focused on addressing. Items further down the priority list may be
addressed if the effort outlined below goes ahead of schedule.
The CSI project has been allocated funds to support the efforts of 2.25 FTEs per year for three
years. Each member of the steering committee will have 0.05 FTEs available for general
organizational work as well as participation in several of the planned activities. The remaining
1.9 FTEs will be allocated across a number current or incoming STScI employees.
1. Infrastructure:
○ Support and development of astropy.io.fits : 0.2 FTE from CSI + 0.3
FTE from mission . The FITS data format is used by all past, current and likely
future missions. The package is crucial for our ability to simply access most
astronomical data. However the package is currently entirely without dedicated
support and is not undergoing active development to keep up with mission
needs. As a result, a backlog to unresolved issues has accumulated over the last
year (“issues” are bug reports, changes to the codebase, or requests for
expanded functionality). There are currently 150 unresolved issues and this
number is growing. astropy.io.fits is a package that currently needs
intensive development because the rest of the ecosystem is evolving, however
once key features have been implemented, only maintenance efforts will be
required. Such efforts will require much lower level of support and can be
ultimately provided by members of the community. We estimate that 0.5 FTE is
necessary in Year 1 to get a person up to speed, resolve urgent issues and
prioritize nonurgent ones, and create a plan to for future developments. This
should be a single individual’s effort, as the package is complex enough to
require a specialized expertise, lacking at STScI since the departure of Erik Bray.
The milestones for this effort are the following:
■ 2 months of effort for getting up to speed on the package
■ 0.5 months of effort sorting issues
■ 3 months of effort resolving urgent issues
■ 0.5 month prioritizing nonurgent issues and developing
recommendations for future work.
○ Support and development for astropy.wcs : 0.2 FTE shared w/ gwcs :
Similar to astropy.io.fits , the astropy.wcs subpackage is a critical
underpinning of presentday data analysis, as it provides a way to translate pixel
coordinates into celestial coordinates. As with astropy.io.fits , there is a
growing backlog of issues that require attention (>~40), as well a a modest
amount of additional development needed to support new efforts in the astropy
ecosystem. This work also needs careful coordination with the gwcs efforts
detailed below, as gwcs is essentially the successor to astropy.wcs . The
milestones for this effort are the following:
■ Design and implementation of an interface to gwcs that is dropin
compatible with a comparable interface to a
stropy.wcs .
■ (Ongoing) Address bug/issue reports provided by the community.
Progress will be measured with the number of closed issues. The goal of
Year 1 is to reduce the overall number of issues by half, to a steadystate
of 20.
○ Exploratory efforts for astroquery.mast: 0.1 FTE contingent on 0.1 FTE
support from the Archive Branch . We consider the development of an API
support for the STScI archive a high priority, however this needs to be done in
collaboration with the Archive who have the expertise. Specifically, API access to
the Hubble Source Catalog, the Hubble Spectroscopic Legacy Archive, and a
cutouts server will be of tremendous use to both the ST and the greater
astronomical community. 0.1 FTE from the members of the steering committee
will be allocated to exploratory efforts to document existing APIs to MAST
archives and to work with the Data Science Mission Office to understand the long
term needs of the institute. The CSI steering committee can consult on any API
planning to ensure that such effort is done with an eye towards an
astroquery.mast e xpansion . T he milestones for this effort are the following:
■ Invite ASB and DSMO leaders and developers to a joint working group to
meet monthly over the course of 3 to 6 months in order to produce a
overview documentation of API capabilities
■ ASB will be expected to produce detailed documentation of these
capabilities, i.e., what API access points exist internally and are open to
the outside, what is their syntax (36 months).
■ CSI will make recommendation about the development of
astroquery.mast . FTEs are currently not allocated to this effort, but
the steering committee will reconsider the issue.
Figure 1: Planned Milestones for the infrastructure development in Year 1
2. Emerging efforts:
○ Develop spectroscopic utilities library specutils : 0.5 FTE. While a
substantial effort has borne fruit in developing infrastructure for imaging data
reduction and analysis in Python (via the photutils package), consistent
support for spectroscopy is much more rudimentary. While there are a range of
domainspecific tools that now exist in the ecosystem, they are generally
incompatible and only suited to a limited set of instruments or datasets. The
Astropysupported specutils effort has aimed to address this, but its
development has not been wellcoordinated enough to achieve its main goal of
being the “gateway” between spectroscopic reduction pipelines and
spectroscopic analysis tools. CSI will jumpstart this effort by providing support
to the newlyformed Astropy “Spectroscopic Coordination Committee”, which
aims to unify these efforts. The CSI effort will be devoted primarily to developing
the interface that specutils will use to connect with data analysis tools
(particularly those developed for JWST spectroscopic instruments) and
instrumentprovided data reduction pipelines (like the specreduce package
produced for the South African Large Telescope). This effort is expected to take
~6 months, including a small workshop at STScI (see below), which will be
followed by development of basic spectroscopic analysis algorithms that will be
compatible with specutils , like emission line flux measurements or spectral
rebinning/interpolation algorithms. The milestones for this effort are:
■ Assign a leader for STScI who will join the Astropy Spectroscopic
Coordination Committee.
■ This lead, in consultation with the SCC, will help develop an APE
(“Astropy Proposal for Enhancement”) to lay out the requirements for a
shared spectrum class and interface that allows data reduction tools to
connect to community and JWSTdeveloped spectroscopic analysis tools.
■ This APE requires extensive consultation with the wider community to
ensure it works well for other softwarefocused spectroscopic efforts. A
workshop (discussed further below) will be held at STScI to get this input
and finalize the design of the APE.
■ With the APE agreed on, the team will then work to implement the actual
code in the specutils package. We expect the majority of the effort to
be spent in this part of the plan.
■ In Years 2 & 3: The CSI lead will oversee a new release of specutils
which includes this work. The remainder of the effort will be focused on
adding baseline analysis tools to specutils ; e.g., equivalent width
measurements, interpolation, or standard star flux calibration tools. The
priorities for this final effort will be set by the CAT effort described below.
○ Communityfocused development of gwcs : 0.2 FTE, shared
w/ astropy.wcs : New missions (including JWST) have instruments that are
difficult or impossible for the FITSWCS format to support for mapping pixel
coordinates to celestial or spectral coordinates. The gwcs (for general WCS)
package is an effort begun at STScI to address these deficiencies. However, it
currently lacks a community user base, in large part because the development
has focused on JWST’s needs without much effort towards documentation and
community outreach. CSI’s effort would go towards just this, leveraging STScI’s
already existing investment in g
wcs . The milestones for this effort are:
■ Investigation of specific workflows that gwcs provides to the wider
community’s science applications. (This effort would be in part assisted by
the CAT development described below.)
■ Development of complete examples with accompanying narrative
documentation for the workflows identified above.
■ Design and implementation of an interface to gwcs that is dropin
compatible with a comparable interface to a
stropy.wcs .
■ (Ongoing:) address bug reports that overlap with the development efforts.
○ Development of astropyhelpers and support for its STScI users: 0.3
FTE : the astropyhelpers framework is a critical part of the astropy
ecosystem which provides astronomers with a way to package their Python code
with little need to understand the intricacies of Python packaging tools.
Figure 2: Planned Milestones for the emerging efforts in Year 1
3. Community engagement
○ Communityassisted tutorials (CAT): 0.8 FTE . As a result of the changes in
workflow in the transition from IRAF to Python, many astronomers are concerned
that they will no longer be able to perform simple exploratory tasks. Therefore,
simply creating libraries with functionality needed by astronomers is insufficient.
The goals of the CAT effort is to identify high level science workflows and write
tutorials using tools from the Python astronomy ecosystem. Such tutorials will
resemble the NOAO Cookbooks but will be written as interactive Jupyter
Notebooks such that users can plug and play with their own data. The goal of this
effort is twopronged: (1) to provide tools for the scientific community to carry out
science in a Python environment and (2) to identify areas where current tools are
missing/need added functionality/need better documentation. While a complete
effort of this sort likely requires more than 0.8 FTE, the CSI effort will be in part a
coordination role. The tutorials themselves will be created by both CSI effort and
in part outsourced to members of the community with who have the needed
expertise. The willingness to create and use these sorts of tutorials is clearly
demonstrated by the substantial (but scattered) existing set of tutorials for
Astropy and related tools. Tutorial development in the first year will focus on
welldeveloped science areas such as aperture photometry and related
photometry/image analysis tasks (for which the actual needed functionality
already exists in photutils ). Additional target areas include higher level
science workflows which combine multiple packages or libraries, such as: data
mining of large surveys, usage of AstroDrizzle integrated with other tools,
reduction of singleslit spectra followed by emission line measurements
(contingent on the specutils effort outlined above). Further tutorial topics will
be determined from the suggestions of the advisory board and by the results of
the surveys described below. Milestones for the effort include:
■ Develop a plan for storage and access to the tutorials, likely a repository
connected to the CSI webpage. Coordinate plan with OED IRAF
replacement effort, discuss common access portal. Further coordination
will be done with the JWST tutorial development effort to ensure there will
be no duplication of work.
■ Determine a standard workflow for developing the tutorials including at
least one iteration of review by community scientists (i.e.,
nondevelopers).
■ Develop actual tutorials. The following six cookbooks/tutorials constitute
the baseline plan for Year 1 deliveries (which will expand or change
based on community feedback):
1. Tutorial for highlevel workflow for going from raw data from a
standard imaging CCD to aperture photometry of objects in the
field (using p
hotutils , c cdproc , etc).
2. Tutorial for highlevel workflow for going from archival HST data to
aperture photometry of objects in the field.
3. Tutorial for highlevel workflow for going from archival HST data to
singleepoch PSF photometry of objects in the field. (May be
integrated with the above tutorial.)
4. Development of highlevel workflows for more complex usage
cases of AstroDrizzle integrated with the above tutorials. May
include specific parameter sets as useful “defaults” for specific
science cases.
5. Tutorial for highlevel workflow of downloading an SDSS dataset
and building colormagnitude diagrams to identify a particular
interesting subset of galaxies.
6. (Dependent on progress of astroquery.mast efforts:) Tutorial
for highlevel workflow of downloading Hubble Source Catalog
datasets and building colormagnitude diagrams to determine
distance to a star cluster.
○ Community Surveys: To better guide the efforts in future years, the steering
committee and project manager will solicit input from the community in several
forms including, but not limited to, advertising the effort at AAS, creating an
ongoing effort to survey the community’s “missing” tools, and supporting
Astropy’s userecho system for determining priorities for the Python astronomy
ecosystem. The steering committee and advisory board will use these tools to
determine continuing priorities in the following years.
○ Spectroscopic Tools Workshop : As described above, one of CSI’s major
efforts for Year 1 focuses on the specutils tool and related infrastructure. To
this end, CSI will host a small workshop dedicated to a facetoface meeting
between stakeholders for spectroscopic analysis software, including large
institution representatives (e.g. JWST, SDSS, DESI, the Subaru PFS), and
individual community developers who have developed Python packages for
spectroscopic analysis (e.g., J. Xavier Prochaska/ linetools , Adam
Ginsburg/ pyspeckit , Steve Crawford/ specreduce ). This workshop will not
require substantial support from SMO conference organizers, but will require
travel support to defer costs for some of the participants who otherwise do not
have budgets for their software work. Also in Year 1, the steering committee will
communicate with the organizers of conferences that bring together software
developers (.Astronomy, Python in Astronomy) and try to bring these events to
STScI in Years 2 and 3.
Figure 3: Planned Milestones for the community engagement in Year 1
The remainder of FTEs (0.15) will be allocated as needed to supporting the project managers in
their efforts to organize the CSI effort.
Figure 4: Planned Milestones for the CSI: STScI Year 1
Year 2/3 Plan
Because CSI is focused on community software, a detailed 2nd and 3rdyear plan is not called
for, as community development often proceeds without coordination an effort currently lacking
may gain momentum with mild investment from CSI (e.g., the s pecutils effort described
above), while other areas currently going strong may slow or fail if current key players change
interests or leave the field. Furthermore, specific efforts from STScI missions and the wider
AURA environment may substantially change on a ~12 year timescale.
Hence, as detailed above, many of the Year 1 initiatives have some component focused on
exploratory work, to better understand exactly which areas require more targeted investment in
future years. Informed by these efforts, the CSI steering committee will revise our prioritization
effort (described at the start of the Year 1 plan) in the final quarter of 2017, and use those
prioritizations to adjust FTE allocations from completed Year 1 efforts to areas which may
require more targeted effort. Below we provide examples that the Year 1 prioritization identified
as highpriority, but not high enough to be included in our Year 1 efforts.
● astropy.modeling.models : this a stropy subpackage contains physical models
useful for astronomical constructs, like emission line profiles, Point Spread Functions,
blackbody radiation curves, etc. While the baseline framework has been developed (in
substantial part through the work of SSB at STScI), the m odels subpackage is in need
of a lead developer to drive the remainder of the vision forward. This includes adding
support for unitful model parameters, supporting easier ways of building compound
models, and in particular, providing more highlevel documentation demonstrating
practical applications of the framework.
● cihelpers : this piece of Python infrastructure is related to the a stropyhelpers
described above, but is focused on automated t esting of astronomy software. This effort
(and related efforts like a sv ) are critical to the success of community software because
they ensure the software does not develop bugs as the functionality increases. The
effort requires constant supervision, however, as the tools evolve over time, and the
astronomy community is often unaware of how to apply these tools in a scientific context.
cihelpers provides the framework and documentation to do just this.
● astropy.modeling.fitting : this a
stropy subpackage complements m odels by
providing algorithms to fit the m odels to data. However, this will require substantial
effort to keep in line with any changes to m odels , and would benefit from further effort
based around applying new statistical techniques.
● Grism data analysis and reduction software: STScI has extensive inhouse knowledge
of grism software, including packages like a Xe and g rizli . These efforts are not
wellcoordinated, however, and could be greatly improved with some targeted effort to
make them more intercompatible and documented as well as release them to the
community.
● Extinction laws: The astronomical Python ecosystem currently has no centralized source
for models of extinction or reddening, a key scientific tool for all modern astronomy
outside the solar system. Expertise at STScI (e.g., Karl Gordon) could be leveraged to
provide precisely this primary repository, especially if combined with the work on
astropy.modeling outlined above.
● An additional ~20 software efforts were prioritized on the spreadsheet, but are not
included here for brevity.
Reporting/Success Metrics
As described above, the steering committee will evaluate on a quarterly basis whether the
priorities described above need adjustment, in consultation with the various stakeholders. This
will include review by the STScI Astronomy Software coordination project scientist (Harry
Ferguson). For more finegrained feedback, the steering committee will also evaluate progress
on each of the above priorities monthly. For the infrastructure priorities that are focused around
individual packages (e.g. a stropy.io.fits ), this will be primarily focused on Github issues or
pull requests (i.e., bug reports closed, new features implemented, etc.). The emerging efforts
and Community Engagement efforts will by necessity be less fixed in form and milestone. Those
efforts are built on exploratory work that m
ust get feedback from the community, so monthly
reports in some cases will be focused on communication efforts rather than specific milestones
(e.g., “proposed a s pecutils class design, which was amended by the community and is still
under consideration”). The work plan will be amended as these efforts take shape and more
specific expectations of the effort and its time line can be set.
Budget
● $852,500: Equivalent of 2.25 FTE per year for 3 years
● $60,000: Support for small workshops at STScI and travel to external workshops
Possible Personnel for Year 1 Efforts
CSI will aim to be involved in efforts to distribute existing personnel, to ensure that we attract
appropriate talent in hiring efforts. Our proposal for utilizing existing personnel to coordinate with
astropy developers (and other external people) on our proposed efforts is as follows:
● astropy.io.fits : Pey Lian Lim (STScI), Matt Craig (MN State), Simon Conseil
(CRAL)
● astropy.wcs/gwcs : Nadia Dencheva (STScI), Simon Conseil (CRAL), Michael Seifert
● astroquery.mast : Jonathan Hargis (STScI)
● specutils : Nick Earl (STScI), Adam Ginsburg (NRAO), Steve Crawford (SALT)
● astropyhelpers : Pey Lian Lim (STScI), Brigitta Sipocz (Cambridge)
● CAT: Pey Lian Lim, Justin Ely, Megan Sosey, Sara Ogaz (STScI), Adrian PriceWhelan
(Princeton), Kelle Cruz (Hunter College/AMNH)
● Community Surveys: Steering committee as a whole, Ivelina Momcheva, Justin Ely, Sara
Ogaz (all STScI)