ABSTRACT
During the last decade, a considerable amount of effort has been made to classify
variable stars using different machine learning techniques. Typically, light curves are
represented as vectors of statistical descriptors or features that are used to train vari-
ous algorithms. These features demand big computational powers that can last from
hours to days, making impossible to create scalable and efficient ways of automatically
classifying variable stars. Also, light curves from different surveys cannot be integrated
and analyzed together when using features, because of observational differences. For
example, having variations in cadence and filters, feature distributions become biased
and require expensive data-calibration models. The vast amount of data that will
be generated soon make necessary to develop scalable machine learning architectures
without expensive integration techniques. Convolutional Neural Networks have shown
impressing results in raw image classification and representation within the machine
learning literature. In this work, we present a novel Deep Learning model for light
curve classification, mainly based on convolutional units. Our architecture receives
as input the differences between time and magnitude of light curves. It captures the
essential classification patterns regardless of cadence and filter. In addition, we intro-
duce a novel data augmentation schema for unevenly sampled time series. We tested
our method using three different surveys: OGLE-III; Corot; and VVV, which differ in
filters, cadence, and area of the sky. We show that besides the benefit of scalability, our
model obtains state of the art levels accuracy in light curve classification benchmarks.
Key words: light curves – variable stars – supervised classification – neural net –
deep learning
Figure 1. Comparison of RR Lyrae ab and Cepheid stars in OGLE-III, VISTA and Corot Survey respectively. Difference between
magnitude and cadence is shown.
coverage among surveys makes difficult to share data. The past. The power of current telescopes and the amount of
difference in filter and cadence makes it even harder without data they generate have practically solved the problem. The
any transformation to the data. Figure 1 shows an example improves in technology and the big amount of data makes
of the complexity that exists among stars and surveys. Light ANNs feasible for the future challenges in astronomy.
curves have a difference in magnitude and time, and most Artificial neural networks or deep neural networks cre-
of the time they are not human-eye recognizable, even by ate their own representation by combining and encoding the
experts. Since all the magnitudes are calibrated by using input data using non-linear functions (LeCun et al. 2015).
statistics, it does not work correctly because of underlying Depending on the number of hidden layers, the capacity
differences between surveys. Figure 2 shows a comparison of extracting features improve, together with the need for
of statistical features of RR Lyrae ab stars using three dif- more data (LeCun et al. 2015). Convolutional neural net-
ferent catalogs. To the best of our knowledge, little efforts works (CNNs) are a particular type of neural network that
have been made to create invariant training sets within the have shown essential advantages in extracting features from
datasets. Benavente et al. 2017 proposed an automatic sur- images (Krizhevsky et al. 2012). CNNs use filters and con-
vey invariant model of variable stars transforming FATS sta- volutions that respond to patterns of different spatial fre-
tistical vectors (Nun et al. 2015) from one survey to another. quency, allowing the network to learn how to capture the
As previously mentioned, these features have the problem most critical underlying patterns and beat most of the clas-
of being computationally expensive and the creation of new sification challenges (Krizhevsky et al. 2012). Time series,
ones implies a lot of time and research. Therefore, there is a as well as in images, have also proven to be a suitable field
necessity of faster techniques able to use data from different for CNNs (Zheng et al. 2014; Jiang & Liang 2016).
surveys. In this paper, we propose a convolutional neural net-
Artificial neural networks (ANNs) have been known for work architecture that uses raw light curves from different
decades (Cybenko 1989; Hornik 1991), but the vast amount surveys. Our model can encode light curves and classify be-
of data needed to train them made them infeasible in the tween classes and subclasses of variability. Our approach
2 RELATED WORK
As mention before, there has been huge efforts to classify
variable stars (Richards et al. 2011; Nun et al. 2015, 2014;
Huijse et al. 2014; Mackenzie et al. 2016; Pichara et al.
2016a; Valenzuela & Pichara 2017b). The main approach
has been the extraction of features that represent the infor- Figure 2. Comparison of FATS feature in RR Lyrae ab using his-
mation of light curves. Debosscher et al. 2007 was the first togram plots of stars using OGLE-III, Corot and Vista Surveys.
Every feature is shown with its relative importance in classifica-
one proposing 28 different features extracted from the pho-
tion as mention in Nun et al. 2015.
tometric analysis. Sarro et al. 2009 continue the work by
introducing the color information using the OGLE survey
and carrying out an extensive error analysis. Richards et al. put together in a Python library2 . Kim & Bailer-Jones 2016
2011 use 32 periodic features as well as kurtosis, skewness, publish a library for variable stars classification among seven
standard deviation, and stetson, among others, for variable classes and subclasses. The library extracts sixteen features
stars classification. Pichara et al. 2012 improve quasars de- that are considered survey-invariant and uses random forest
tection by using a boosted tree ensembles with continuous for doing the classification process.
auto regressive features (CAR). Pichara & Protopapas 2013 A novel approach that differs from most of the previ-
introduce a probabilistic graphical model to classify variable ous papers is proposed by Mackenzie et al. 2016. He face
stars by using catalogs with missing data. the light curve representation problem by designing and im-
Kim et al. 2014 use 22 features for classifying classes plementing an unsupervised feature learning algorithm. His
and subclasses of variable stars using random forest. Nun work uses a sliding window that moves over the light curve
et al. 2015 published a library that facilitates the extraction and get most of the underlying patterns that represent ev-
of features of light curves named FATS (Feature Analysis ery light curve. This window extracts features that are as
for Time Series). More than 65 features are compiled and good as traditional statistics, solving the problem of high
computational power and removing the human from the
e x − e−x
tan(x) = (3)
e x + e−x
one, except for the input layer that does not have any input
and the output layer that does not have any output. A fully
connected layer is when every neuron in one layer connects
to every neuron in the other one. The vanilla architecture Figure 5. Step by step of a convolution process. A sliding window
consists of two fully connected layers. and a moving step are applied to the data and given as an input
to the next layer.
The number of perceptrons for each layer depends on
the architecture chosen and therefore the complexity of the
model. A neural network can have hundreds, thousands or
the inverse of the gradient and a learning rate as shown in
millions of them. The experience of the team, as well as
Werbos 1990.
experimenting different architectures, is critical for choosing
Training artificial neural networks with backpropaga-
the number of layers, perceptrons for each one and the num-
tion can be slow. Many methods have been proposed based
ber of filters to be used. The number of hyperparameters is
on stochastic gradient descent (SGD) (Ruder 2016). The
mainly given by the weights in the architecture. The input
massive astronomical datasets make training infeasible in
layer is where we submit our data and has as many neurons
practice, and mini-batches are used to speed up the process
as our input does. The hidden layer is the one in charge of
(LeCun et al. 1998). A training epoch corresponds to a pass
combining the inputs and creating a suitable representation.
over the entire dataset, and usually, many epochs are needed
Finally, the number of neurons in the output layer is as many
to achieve good results. The way weights are updated can
classes we want to classify.
change as well. One of the most widely used optimizers has
Many architectures have been proposed for artificial
been Adam optimizer as describe in Kingma & Ba 2014. It
neural networks. The vanilla architecture can be modified
relies on the first moment (mean) and second moment (vari-
in the number of hidden layers and the number of per-
ance) of the gradient to update the learning rates. Ruder
ceptrons per layer. ANNs with one hidden layer using sig-
2016 present an overview of the different gradient descent
moid functions are capable of approximating any continuous
optimizers and the advantages and disadvantages for each
functions on a subset of Rn (Cybenko 1989). However, the
one.
number of neurons needed to do this increases significantly,
which could be computationally infeasible. Adding more lay-
ers with fewer perceptrons can achieve same results without
3.2 Convolutional Neural Nets
affecting the performance of the net (Hornik 1991). More
than three hidden layers are considered deep neural net- Convolutional neural networks (CNN) are a type of deep
works (DNN). DNNs extract information or features com- neural network widely used in images (Krizhevsky et al.
bining outputs from perceptrons, but the number of weights 2012; LeCun et al. 2015). It consists of an input and out-
and data needed to train them significantly increases (Le- put layer as well as several hidden layers different from fully
Cun et al. 2015). connected ones.
To train artificial neural networks we find the weights A convolutional layer is a particular type of hidden layer
that minimize a loss function. For classification purpose, used in CNNs. Convolutional layers are in charge of extract-
one of the most use loss functions is the categorical cross- ing information using a sliding window.As shown in Figure 5
entropy for unbalanced datasets (De Boer et al. 2005). Ini- the window obtains local patterns from the input and com-
tially, weights are chosen at random and are updated be- bines them linearly with its weights (dotted line). Then ap-
tween epochs. We compare the desired output with the ac- ply a nonlinear function and pass it to the next layer. The
tual one and pursue to minimize the loss function using back- sliding window moves and extracts local information using
propagation with any form of Stochastic Gradient Descent different inputs but with the same weights. The idea is to
(SGD) (Ruder 2016). Then we update each weight using specialize this window to extract specific information from
4 METHOD DESCRIPTION
4.4 Flatten Layer
We propose an architecture that can classify variable stars
using different surveys. We now explain each layer of our After extracting the local patterns, we transform the last
architecture, depicted in Figure 6. convolution into a flatten layer as in Jiang & Liang 2016;
Our architecture transforms each light curve to a matrix Zheng et al. 2014. Our layer combines its patterns afterwards
representation using the difference between points. We use with a hidden layer in a fully connected way.
two convolutional layers for extracting the local patterns and
turn them into a flat layer. Two fully connected layers are
used, and an output layer is plugged at the end to perform 4.5 Hidden Layer
the classification. In the following subsections, we describe We use a hidden layer to combine our extracted patterns,
and give insights on each of the layers. and the number of cells is given by ncells . After several ex-
periments, we realize that 128 cells generate the best results.
4.1 Pre-processing We perform many experiments using sigmoid, relu and tanh
activating functions. We obtain the best results using tanh
In this phase, light curves are transformed into a matrix. activation, as most of the deep learning literature suggests
Having a balanced dataset is critical for our purpose of (LeCun et al. 1998).
multi-survey classification. Therefore, we use NM ax as the
maximum number of stars we can extract per class and sur-
vey. Section 6.3 explains in detail the selection of the light 4.6 Softmax Layer
curves for the database.
We transform each of these light curves in a matrix rep- In the output layer, there is one node per each of the possible
resentation of size 2 × N where 2 corresponds to the number variability classes. We test two different amount of classes:
of channels (time and magnitude) and N to the number of one for 4 classes of variable stars and the other for 9 sub-
points used per light curve. Figure 7 shows an example of a classes. We use a so f tmax function to shrink the output to
light curve in a matrix representation. the [0, 1] range. We can interpret the numbers from the out-
To compare light curves between catalogs a reshape to put nodes as the probability that the light curve belongs to
the matrix must be made. Light curves differ in magnitude the class represented by that node.
and time and for comparing them the difference between Finally, we minimize the average across training using
observations was used. A matrix of size M ×2× N was created categorical cross entropy. We use categorical cross entropy as
where M, 2 and N corresponds to the number of light curves, our loss function as we obtained best results and the datasets
channels, and numbers of observations used. Figure 7 shows use are unbalanced.
an example of the transformation of a light curve. Section
6.1 explains in detail this part of the process.
5 DATA
4.2 First Convolution
We apply our method to variable star classification using
We apply a convolutional layer to each of the channels in three different surveys: ”The Optical Gravitational Lensing
separate branches with shared weights. We use a shared Experiment” (OGLE) (Udalski 2004), ”The Vista Variable
convolutional layer to preserve the objective of integrating in the Via Lactea” (VVV) (Minniti et al. 2010) and ”Convec-
datasets with different cadences. Shared layers mean that tion, Rotation and planetary Transit” (CoRot) (Baglin et al.
each of the filters is the same on every tower. The number 2002; Bordé et al. 2003). We select these surveys because of
of filters is given by S1 . We chose 64 filters, to match the their difference in cadence and filters. In the following sub-
number of features presented in Nun et al. 2015. sections, we explain each of these surveys in detail.
First-Overtone 10
Classical Cepheid CEP10 5
Fundamental-Mode F
Classical Cepheid CEPF 23
RR Lyrae ab RRab 10567
RR Lyrae c RRc 4579
Mira Mira 8445
Semi-Regular Variables SRV 37366
Small Amplitude Red Giants OSARGs 182795
Contact Eclipsing Binary EC 1818
Semi-Detached
Eclipsing Binary nonEC 786
RR Lyrae ab RRab 28
RR Lyrae c RRc 481
7 PARAMETER INITIALIZATION
As in most of deep learning solutions, our architecture needs
the set up of several initial parameters. In this section, we
Figure 8. Example of a new light curve using a burning param- explain the model design and how to set up its parameters.
eter of 2 and a step parameter of 1.
7.1 Parameters
with the previous measurements. This removes the extinc- As previously noted, surveys have different optics and ob-
tion, distance and survey specific biases. Moreover, it acts servation strategies which impact the depth at which they
as a normalization method. It enables the network to learn can observe. This impacts the number of variable stars de-
patterns directly from the observations without the need to tected and cataloged. In our case, OGLE has been operating
pre-processing any of the data or extinction correction. longer and observes large portions of the sky while VVV goes
deeper but in a smaller area, and Corot has great time res-
6.2 Light Curve padding olution but is considerably shallower. The combined catalog
is dominated by OGLE stars and the subclasses are highly
The difference in cadence among OGLE-III, Vista, and unbalanced, being the LPV class the majority of them.
Corot catalogs create a big variance in the number of ob- In order to train with a more balanced dataset, we use
servations per light curve. To overcome this problem, we a limit of 8, 000 stars per class and survey. We test different
impose a minimum number of observations and use a zero values and set it to 8, 000 as most of the classes and sub-
padding to complete the light curves that cannot reach that classes of VISTA and OGLE survey possess that amount as
minimum. This is inspired by the padding procedure done shown in section 5. Finally, after several experiments mea-
in deep learning for image analysis. To define such limit, we suring the training speed and efficiency, we set the batch size
tried many different values and notice that classification re- to 256. Table 4 shows a summary of the parameters of our
sults do not change significantly within a range of 500 and architecture.
1, 500 observations. We fixed the limit at 500 points per light
curve because that amount preserves the classification ac-
curacy and keeps most of the light curves suitable for our 7.2 Layers
analysis.
We use two convolutional layers as done in Jiang & Liang
2016. In the imaging literature, several works show that one
6.3 A Light curve data augmentation model convolutional layer is not enough to learn a suitable repre-
sentation, and commonly they use two convolutions (Zheng
Hensman & Masko 2015 studied the impact of unbalanced et al. 2014; Jiang & Liang 2016; Gieseke et al. 2017).
datasets in convolutional neural networks and proved that We try using only one convolution and performance was
for better classification performance the dataset should be critically reduced. Three convolutions are also utilized, pro-
class-balanced. Since the datasets used in this paper are un- ducing results as good as using two, but the time for training
balanced, data augmentation techniques have to be applied the net and the number of parameters increase significantly.
(Krizhevsky et al. 2012). A window size tw was used for the convolution process
To balance the dataset, we propose a novel data aug- and set to 42 observations or 250 days in average as done in
mentation technique based on light curve replicates. As men- Mackenzie et al. 2016; Valenzuela & Pichara 2017a. Finally, a
tion before in Section 4, the number of stars per class and stride value sw was used and set to 2 or 12 days in average as
survey is given by NM ax . If the number of light curves per done in Mackenzie et al. 2016; Valenzuela & Pichara 2017a.
class and survey is larger than this parameter, the replica-
tion process does not take place. Otherwise, the light curves
are replicated until they reach that limit. Each class is repli- 7.3 Activation functions
cated using two light curve parameters: burning and step.
In convolutional layers we used relu activation function as
The burning parameter indicates how many points we have
they are capable of extracting the important information .
to discard in the light curve. The step parameter tells ev-
ery how many points we should take samples. The burning relu(x) = max(0, x)) (5)
Global Parameters
Architecture
Table 5. Approximately time of extraction of features and train- 8.2 Results with general classes of variability.
ing the algorithms.
We test our model using four general classes: (i) Cepheids
(CEP), (ii) Long Period Variables (LPV), (iii) RR Lyrae
Method Extraction Training Total
(RRlyr) and (iv) Eclipsing Binaries (ECL). The distribu-
of Features Algorithm Run Time
tion of classes and subclasses per survey are shown in Ta-
RF 11.5 days 36 min 11.52 days bles 1, 2 and 3. Figure 11 and 12 show the results of using
Classes CNN 30 min 50 min 1.33 hrs our convolutional architecture and RF respectively. Table 6
Subclasses CNN 30 min 91.8 min 2.03 hrs summarize the accuracy per class of both approaches.
As it can be seen, RF achieves 96% of accuracy in
Table 6. Accuracy per class and subclass for each survey. OGLE-III dataset as it has more labeled data than the oth-
ers surveys. In VVV RF obtains 97% of accuracy in some of
Class CNN RF the classes that have more labeled data (LPV ), but not in
ECL-OGLE 0.98 ± 0.01 0.97 ± 0.01 stars with few labeled examples (ECL and CEP ). In Corot,
ECL-VVV 0.92 ± 0.02 0.89 ± 0.03 RF achieves 91% of accuracy only in ECL stars, mainly be-
ECL-Corot 0.00 ± 0.00 0.91 ± 0.04 cause of the high cadence of Corot that make it infeasible
LPV-OGLE 0.99 ± 0.00 0.97 ± 0.01 to extract features correctly, especially those related to pe-
LPV-VVV 0.94 ± 0.01 0.97 ± 0.01 riodicity. RF results show that FATS features of some light
LPV-Corot 0.92 ± 0.11 0.00 ± 0.00 curves (LPV and RRlyr ) can classify accurately between dif-
RRLyr-OGLE 0.94 ± 0.01 0.97 ± 0.00 ferent surveys. That is not a surprise mainly because period
RRLyr-VVV 0.94 ± 0.01 0.86 ± 0.02 features are less sensitive to changes in cadence.
RRLyr-Corot 0.00 ± 0.00 0.58 ± 0.07
Our proposed architecture achieves comparable classi-
CEP-OGLE 0.90 ± 0.03 0.93 ± 0.01
fication accuracy in OGLE-III but with much less training
CEP-VVV 0.00 ± 0.00 0.08 ± 0.17
CEP-Corot 0.90 ± 0.08 0.46 ± 0.12 time. As shown in Figure 9 our model produces approxi-
mately an accuracy of 97% in the validation set. Each of the
Subclass CNN RF colors represents one training of the 10-fold stratified cross-
EC-OGLE 0.93 ± 0.01 0.91 ± 0.01 validation. As shown in Table 6, OGLE-III dataset achieves
EC-VVV 0.69 ± 0.04 0.71 ± 0.03 95% of accuracy in average in all of its classes. VVV survey
nonEC-OGLE 0.98 ± 0.01 0.94 ± 0.01 achieves 93.7% of accuracy in most of its classes, except for
nonEC-VVV 0.51 ± 0.04 0.37 ± 0.05 CEP stars, which are less than 40 light curves. Comparing it
Mira-OGLE 0.98 ± 0.01 0.98 ± 0.00 to RF performance, it achieves better performance in VVV
Mira-VVV 0.94 ± 0.01 0.29 ± 0.02 as it has 92% of accuracy on each of the classes, except for
SRV-OGLE 0.92 ± 0.02 0.93 ± 0.01 CEP. In Corot, CNN and RF achieve comparable results.
SRV-VVV 0.00 ± 0.00 0.67 ± 0.02
Osarg-OGLE 0.90 ± 0.01 0.88 ± 0.01
Osarg-VVV 0.00 ± 0.00 0.72 ± 0.01
RRab-OGLE 0.72 ± 0.03 0.96 ± 0.01 8.3 Results with subclasses of variability
RRab-VVV 0.77 ± 0.02 0.83 ± 0.01 We test our model using nine types of subclasses:
RRab-Corot 0.11 ± 0.15 0.48 ± 0.29
(i) First-Overtone 10 Classical Cepheid (Cep10), (ii) Funda-
RRc-OGLE 0.86 ± 0.02 0.98 ± 0.00
mental-Mode F Classical Cepheid (CepF), (iii) RR Lyrae ab
RRc-VVV 0.75 ± 0.03 0.83 ± 0.02
RRc-Corot 0.01 ± 0.01 0.99 ± 0.01 (RRab), (iv) RR Lyrae c (RRc), (v) Mira, (vi) Semi-Regu-
CEP10-OGLE 0.84 ± 0.03 0.92 ± 0.02 lar Variables (SRV), (vii) Small Amplitude Red Giants (OS-
CEP10-VVV 0.00 ± 0.00 0.00 ± 0.00 ARGs), (viii) Eclipsing Binaries (EC) and (ix) Eclipsing
CEPF-OGLE 0.72 ± 0.02 0.90 ± 0.01 Binaries Semi-Detached (nonEC). The distribution of sub-
CEPF-VVV 0.00 ± 0.00 0.00 ± 0.00 classes per survey are shown in Tables 1, 2 and 3. Figure 13
and 14 show the results of using our neural network archi-
tecture and Random Forest respectively. Table 6 summarize
the accuracy per subclass of both approaches.
As shown in Figure 14, RF achieves better accuracy in
RR Lyrae and Cepheid stars, despite the small number of
Note that a significant improvement in the feature extraction light curves and mainly because of the data augmentation
process could be made if FATS library supported GPUs. technique. As shown in Table 6, OGLE-III dataset achieves
Our proposed architecture and RF are trained using a more than 90% of accuracy in most of the subclasses, as it
computer with 128 GB RAM, a GeForce GTX 1080 Ti GPU has more labeled data than the others surveys. However, in
and 6 CPU. Our algorithm is developed using Keras (Chollet VVV survey, a 80% of accuracy is obtained in RRab and
et al. 2015) framework that runs on top of Tensorflow (Abadi RRc stars. Finally, Corot’s catalog achieves 99% of accuracy
et al. 2015) library. We use the scikit-learn (Pedregosa et al. in RRc stars.
2011) implementation for RF with defaults settings except As shown in Figure 10, our model produces approxi-
for the minimum samples leaf that was set to a 100 for better mately an accuracy of 85% in the validation set. In OGLE-III
accuracy. dataset, nonEC and Mira stars achieve a 98% of accuracy,
We can see that our method is significantly faster as it and 92% in EC, SRV, and Osarg stars. In VVV survey, our
works with raw magnitudes and time and requires only a model achieves 76% of accuracy in RRab and RRc stars, and
couple of minutes of pre-processing. 94% in Mira stars. However, Osarg and SRV are confused
Figure 11. Confusion matrix per class and survey for the convolutional neural network. Empty cells correspond to 0%.
in a 81% and 93% with Mira type respectively, which indi- ically discovering patterns across light curves even with odd
cates a clear overfitting of LPV stars. None of the models cadences and bands. We show that multi-survey classifica-
can correctly classify VVV Cepheids, mainly because of the tion is possible with just one architecture and we believe it
low number of light curves (28 light curves in total). With deserves further attention soon. The proposed model is com-
EC classes from VVV we achieve better results than RF. parable to RF in classification accuracy but much better in
Finally, the accuracy achieved in Corot is the lowest, mainly scalability. Also, our approach can correctly classify most of
because of the few amounts of light curves used (28 RRab the classes and subclasses of variability.
and 481 RRc stars). Like most of deep learning approaches, our model is
capable of learning its light curve representation, allowing
astronomers to use the raw time series as inputs.
9 CONCLUSIONS
In order to have an extra comparison point, we attempt
In this work, we have presented a CNN architecture to clas- to compare our method with the approach presented in Ma-
sify variable stars, tested on light curves integrated from habal et al. (2017). We implemented their algorithm; we
various surveys. The proposed model can learn from the se- run it with the same catalogs we use for our method, tak-
quence of differences between magnitude and time, automat- ing 500 observations per light curve. Mahabal’s method did
Figure 12. Confusion matrix per class and survey using Random Forest algorithm. Empty cells correspond to 0%.
not generate results, because it takes about 5 hrs to process that their method is worse than ours, it is just intended for
just one light curve. The extra computational cost mainly another problem setup. We believe that it is not a fear com-
comes because their algorithm generates the 2D embeddings parison to add these results to our paper, because Mahabal’s
by comparing every pair of points in each light curve, some- method would be using much less information compared to
thing that makes it impractical for our experimental setup. ours.
To obtain results with Mahabal’s approach, we decrease the
number of observations per light curve until their method As future work, oversampling techniques should be
return results in a comparable amount of time. With about studied, given that we observe that our model is sen-
100 observations, their method takes about 5 minutes to sitive to unbalanced training sets. Also, more complex
run, still too slow for our setup (about 140000 light curves, architectures must be developed to improve classification
8000 light curves per class, etc.) By using 50 observations, accuracy. New approaches able to deal with few light
their method could generate results. As we expected, their curves in some classes are needed. For example, simulation
classification results are much worse, given that the small models based on astrophysics would be a significant con-
number of observations make impossible even capture the tribution especially for the most unrepresented subclasses
light curve periods in most of the cases. We are not telling of variability. With simulation models, deep learning
architectures could be significantly improved, given that
Figure 13. Confusion matrix per class and survey for the convolutional neural network. Empty cells correspond to 0.
Figure 14. Confusion matrix per class and survey using Random Forest algorithm. Empty cells correspond to 0.
REFERENCES Bloom J., Richards J., 2011, Advances in Machine Learning and
Data Mining for Astronomy
Abadi M., et al., 2015, TensorFlow: Large-Scale Machine Learning
Bordé P., Rouan D., Léger A., 2003, Astronomy & Astrophysics,
on Heterogeneous Systems, https://www.tensorflow.org/
405, 1137
Abell P. A., et al., 2009, arXiv preprint arXiv:0912.0201
Borne K. D., Strauss M., Tyson J., 2007, BULLETIN-
Baglin A., et al., 2002, in Stellar structure and habitable planet
AMERICAN ASTRONOMICAL SOCIETY, 39, 137
finding. pp 17–24
Basheer I., Hajmeer M., 2000, Journal of microbiological methods, Breiman L., 2001, Machine learning, 45, 5
43, 3 Cabrera-Vives G., Reyes I., Förster F., Estévez P. A., Maureira
Belokurov V., Evans N. W., Du Y. L., 2003, Monthly Notices of J.-C., 2017, The Astrophysical Journal, 836, 97
the Royal Astronomical Society, 341, 1373 Chollet F., et al., 2015, Keras, https://github.com/fchollet/
Benavente P., Protopapas P., Pichara K., 2017, The Astrophysical keras
Journal, 845, 18pp Cybenko G., 1989, Mathematics of Control, Signals, and Systems