Classifier Decision Tree - The various classification engines employed based on number of points in the light curve.

The Transients Classification Pipeline (TCP) is a parallelized, Python-based framework created to identify and classify transient sources in the realtime PTF differencing pipeline (\S XXX). The TCP polls the object database from that pipeline and retrieves all available metadata about recently extracted sources. Using the locations and uncertainties of the transient candidate objects, the TCP either associates an object with an existing known sources in the TCP database or, after passing several filters to exclude non-stellar (e.g. known minor planets) and non-astrophysical events, generates a new source. This Bayseian framework for event clustering is given in {2008AN....329..284B}.

Once a transient source has been identified, the TCP generates "features" which map contextual and time-domain properties to a large dimensional real-number space. After generating a set of features for a transient source candidate, the TCP then applies several science classification tools to determine the most likely science class of that source. For rapid-response transient science, a subset of features --- such as those related to rise times and the distance to nearby galaxies --- are most useful. As light curves are better sampled more features are used in the classification. The resulting science class probabilities are stored in a database for further data mining applications. Sources with high probabilities of belonging to a science class of interest to the PTF group will be broadcast to the PTF's "Followup Marshal" for scheduling of followup observations. As more observations are made for a known source, the TCP will autonomously regenerate features and potentially reclassify that source.

The TCP classification software is structured to allow several science class schemas, which enables use of more than one hierarchy of science class definitions that could be generated by different astronomy groups. Currently one of these classification schemas uses science classes generated by the Weka machine learning package {Garner 95} which has been trained with features derived from existing lightcurves. Another schema is based upon an traditional model and light curve fitting (e.g., supernova typing; {2007AJ....134.1285P}). A third classification schema is based upon algorithms from one or more machine learning packages which will be trained using re-sampled lightcurves stored in TCP our lightcurve warehouse. This lightcurve warehouse, accessible through a public portal in the DotAstro Project, contains representative lightcurves from literature. Currently, 14,000 transients from 87 papers represent 150 science classes.

For training purposes, the TCP re-samples the representative light curves in the DotAstro database to match the expected PTF survey cadences and instrument characteristics. The result is a set of science classification algorithms that encompass a wide variety of variable-object science and which can be automatically re-trained and refined as more lightcurve data becomes available. In the future, astronomer classified science based upon class-verified PTF transients will also contribute to this lightcurve warehouse.