/*!
@file ann.txt
@author Marcus Edel (https://kurg.org)
@brief Tutorial for how to use the neural network code in mlpack.

@page anntutorial Neural Network tutorial

@section intro_anntut Introduction

There is vast literature on neural networks and their uses, as well as
strategies for choosing initial points effectively, keeping the algorithm from
converging in local minima, choosing the best model structure, choosing the best
optimizers, and so forth. mlpack implements many of these building blocks,
making it very easy to create different neural networks in a modular way.

mlpack currently implements two easy-to-use forms of neural networks:
\b Feed-Forward \b Networks (this includes convolutional neural networks) and
\b Recurrent \b Neural \b Networks.

@section toc_anntut Table of Contents

This tutorial is split into the following sections:

 - \ref intro_anntut
 - \ref toc_anntut
 - \ref model_api_anntut
 - \ref layer_api_anntut
 - \ref model_setup_training_anntut
 - \ref model_saving_loading_anntut
 - \ref extracting_parameters_anntut
 - \ref further_anntut

@section model_api_anntut Model API

There are two main neural network classes that are meant to be used as container
for neural network layers that \b mlpack implements; each class is suited to a
different setting:

- \c FFN: the Feed Forward Network model provides a means to plug layers
   together in a feed-forward fully connected manner.  This is the 'standard'
   type of deep learning model, and includes convolutional neural networks
   (CNNs).

- \c RNN: the Recurrent Neural Network model provides a means to consider
   successive calls to forward as different time-steps in a sequence.  This is
   often used for time sequence modeling tasks, such as predicting the next
   character in a sequence.

Below is some basic guidance on what should be used. Note that the question of
"which algorithm should be used" is a very difficult question to answer, so the
guidance below is just that---guidance---and may not be right for a particular
problem.

 - \b Feed-forward \b Networks allow signals or inputs to travel one way only.
   There is no feedback within the network; for instance, the output of any
   layer does only affect the upcoming layer. That makes Feed-Forward Networks
   straightforward and very effective. They are extensively used in pattern
   recognition and are ideally suitable for modeling relationships between a
   set of input and one or more output variables.


 - \b Recurrent \b Networks allow signals or inputs to travel in both directions by
   introducing loops in the network. Computations derived from earlier inputs are
   fed back into the network, which gives the recurrent network some kind of
   memory. RNNs are currently being used for all kinds of sequential tasks; for
   instance, time series prediction, sequence labeling, and
   sequence classification.

In order to facilitate consistent implementations, the \c FFN and \c RNN classes
have a number of methods in common:

 - \c Train(): trains the initialized model on the given input data. Optionally
   an optimizer object can be passed to control the optimization process.

 - \c Predict(): predicts the responses to a given set of predictors. Note the
   responses will reflect the output of the specified output layer.

 - \c Add(): this method can be used to add a layer to the model.

@note
To be able to optimize the network, both classes implement the OptimizerFunction
API. In short, the \c FNN and \c RNN class implement two methods: \c Evaluate()
and \c Gradient().  This enables the optimization given some learner and some
performance measure.

Similar to the existing layer infrastructure, the \c FFN and \c RNN classes are
very extensible, having the following template arguments; which can be modified
to change the behavior of the network:

 - \c OutputLayerType: this type defines the output layer used to evaluate the
   network; by default, \c NegativeLogLikelihood is used.

 - \c InitializationRuleType: this type defines the method by which initial
   parameters are set; by default, \c RandomInitialization is used.

@code
template<
  typename OutputLayerType = NegativeLogLikelihood<>,
  typename InitializationRuleType = RandomInitialization
>
class FNN;
@endcode

Internally, the \c FFN and \c RNN class keeps an instantiated \c OutputLayerType
class (which can be given in the constructor). This is useful for using
different loss functions like the Negative-Log-Likelihood function or the \c
VRClassReward function, which takes an optional score parameter. Therefore, you
can write a non-static OutputLayerType class and use it seamlessly in
combination with the \c FNN and \c RNN class. The same applies to the \c
InitializationRuleType template parameter.

By choosing different components for each of these template classes in
conjunction with the \c Add() method, a very arbitrary network object can be
constructed.

Below are several examples of how the \c FNN and \c RNN classes might be used.
The first examples focus on the \c FNN class, and the last shows how the \c
RNN class can be used.

The simplest way to use the FNN<> class is to pass in a dataset with the
corresponding labels, and receive the classification in return. Note that the
dataset must be column-major – that is, one column corresponds to one point. See
the \ref matrices "matrices guide" for more information.

The code below builds a simple feed-forward network with the default options,
then queries for the assignments for every point in the \c queries matrix.

\dot
digraph G {
  fontname = "Hilda 10"
  rankdir=LR
  splines=line
  nodesep=.08;
  ranksep=1;
  edge [color=black, arrowsize=.5];
  node [fixedsize=true,label="",style=filled,color=none,fillcolor=gray,shape=circle]

  subgraph cluster_0 {
    color=none;
    node [style=filled, color=white, penwidth=15,fillcolor=black shape=circle];
    l10  l11  l12  l13  l14  l15  ;
    label = Input;
  }

  subgraph cluster_1 {
    color=none;
    node [style=filled, color=white, penwidth=15,fillcolor=gray shape=circle];
    l20  l21  l22  l23  l24  l25  l26  l27  ;
    label = Linear;
  }

  subgraph cluster_2 {
    color=none;
    node [style=filled, color=white, penwidth=15,fillcolor=gray shape=circle];
    l30  l31  l32  l33  l34  l35  l36  l37  ;
    label = Linear;
  }

  subgraph cluster_3 {
    color=none;
    node [style=filled, color=white, penwidth=15,fillcolor=black shape=circle];
    l40  l41  l42  ;
    label = LogSoftMax;
  }

  l10 -> l20   l10 -> l21   l10 -> l22   l10 -> l23   l10 -> l24   l10 -> l25
  l10 -> l26   l10 -> l27   l11 -> l20   l11 -> l21   l11 -> l22   l11 -> l23
  l11 -> l24   l11 -> l25   l11 -> l26   l11 -> l27   l12 -> l20   l12 -> l21
  l12 -> l22   l12 -> l23   l12 -> l24   l12 -> l25   l12 -> l26   l12 -> l27
  l13 -> l20   l13 -> l21   l13 -> l22   l13 -> l23   l13 -> l24   l13 -> l25
  l13 -> l26   l13 -> l27   l14 -> l20   l14 -> l21   l14 -> l22   l14 -> l23
  l14 -> l24   l14 -> l25   l14 -> l26   l14 -> l27   l15 -> l20   l15 -> l21
  l15 -> l22   l15 -> l23   l15 -> l24   l15 -> l25   l15 -> l26   l15 -> l27
  l20 -> l30   l20 -> l31   l20 -> l32   l20 -> l33   l20 -> l34   l20 -> l35
  l20 -> l36   l20 -> l37   l21 -> l30   l21 -> l31   l21 -> l32   l21 -> l33
  l21 -> l34   l21 -> l35   l21 -> l36   l21 -> l37   l22 -> l30   l22 -> l31
  l22 -> l32   l22 -> l33   l22 -> l34   l22 -> l35   l22 -> l36   l22 -> l37
  l23 -> l30   l23 -> l31   l23 -> l32   l23 -> l33   l23 -> l34   l23 -> l35
  l23 -> l36   l23 -> l37   l24 -> l30   l24 -> l31   l24 -> l32   l24 -> l33
  l24 -> l34   l24 -> l35   l24 -> l36   l24 -> l37   l25 -> l30   l25 -> l31
  l25 -> l32   l25 -> l33   l25 -> l34   l25 -> l35   l25 -> l36   l25 -> l37
  l26 -> l30   l26 -> l31   l26 -> l32   l26 -> l33   l26 -> l34   l26 -> l35
  l26 -> l36   l26 -> l37   l27 -> l30   l27 -> l31   l27 -> l32   l27 -> l33
  l27 -> l34   l27 -> l35   l27 -> l36   l27 -> l37   l30 -> l40   l30 -> l41
  l30 -> l42   l31 -> l40   l31 -> l41   l31 -> l42   l32 -> l40   l32 -> l41
  l32 -> l42   l33 -> l40   l33 -> l41   l33 -> l42   l34 -> l40   l34 -> l41
  l34 -> l42   l35 -> l40   l35 -> l41   l35 -> l42   l36 -> l40   l36 -> l41
  l36 -> l42   l37 -> l40   l37 -> l41   l37 -> l42
}
\enddot
@note
The number of inputs in the above graph doesn't match with the real
number of features in the thyroid dataset and are just used as an abstract
representation.

@code
#include <mlpack/core.hpp>
#include <mlpack/methods/ann/layer/layer.hpp>
#include <mlpack/methods/ann/ffn.hpp>

using namespace mlpack;
using namespace mlpack::ann;

int main()
{
  // Load the training set and testing set.
  arma::mat trainData;
  data::Load("thyroid_train.csv", trainData, true);
  arma::mat testData;
  data::Load("thyroid_test.csv", testData, true);

  // Split the labels from the training set and testing set respectively.
  arma::mat trainLabels = trainData.row(trainData.n_rows - 1);
  arma::mat testLabels = testData.row(testData.n_rows - 1);
  trainData.shed_row(trainData.n_rows - 1);
  testData.shed_row(testData.n_rows - 1);

  // Initialize the network.
  FFN<> model;
  model.Add<Linear<> >(trainData.n_rows, 8);
  model.Add<SigmoidLayer<> >();
  model.Add<Linear<> >(8, 3);
  model.Add<LogSoftMax<> >();

  // Train the model.
  model.Train(trainData, trainLabels);

  // Use the Predict method to get the predictions.
  arma::mat predictionTemp;
  model.Predict(testData, predictionTemp);

  /*
    Since the predictionsTemp is of dimensions (3 x number_of_data_points)
    with continuous values, we first need to reduce it to a dimension of
    (1 x number_of_data_points) with scalar values, to be able to compare with
    testLabels.

    The first step towards doing this is to create a matrix of zeros with the
    desired dimensions (1 x number_of_data_points).

    In predictionsTemp, the 3 dimensions for each data point correspond to the
    probabilities of belonging to the three possible classes.
  */
  arma::mat prediction = arma::zeros<arma::mat>(1, predictionTemp.n_cols);

  // Find index of max prediction for each data point and store in "prediction"
  for (size_t i = 0; i < predictionTemp.n_cols; ++i)
  {
    // we add 1 to the max index, so that it matches the actual test labels.
    prediction(i) = arma::as_scalar(arma::find(
        arma::max(predictionTemp.col(i)) == predictionTemp.col(i), 1)) + 1;
  }

  /*
    Compute the error between predictions and testLabels,
    now that we have the desired predictions.
  */
  size_t correct = arma::accu(prediction == testLabels);
  double classificationError = 1 - double(correct) / testData.n_cols;

  // Print out the classification error for the testing dataset.
  std::cout << "Classification Error for the Test set: " << classificationError << std::endl;
  return 0;
}
@endcode

Now, the matrix prediction holds the classification of each point in the
dataset. Subsequently, we find the classification error by comparing it
with testLabels.

In the next example, we create simple noisy sine sequences, which are trained
later on, using the RNN class in the `RNNModel()` method.

@code
void GenerateNoisySines(arma::mat& data,
                        arma::mat& labels,
                        const size_t points,
                        const size_t sequences,
                        const double noise = 0.3)
{
  arma::colvec x =  arma::linspace<arma::Col<double>>(0,
      points - 1, points) / points * 20.0;
  arma::colvec y1 = arma::sin(x + arma::as_scalar(arma::randu(1)) * 3.0);
  arma::colvec y2 = arma::sin(x / 2.0 + arma::as_scalar(arma::randu(1)) * 3.0);

  data = arma::zeros(points, sequences * 2);
  labels = arma::zeros(2, sequences * 2);

  for (size_t seq = 0; seq < sequences; seq++)
  {
    data.col(seq) = arma::randu(points) * noise + y1 +
        arma::as_scalar(arma::randu(1) - 0.5) * noise;
    labels(0, seq) = 1;

    data.col(sequences + seq) = arma::randu(points) * noise + y2 +
        arma::as_scalar(arma::randu(1) - 0.5) * noise;
    labels(1, sequences + seq) = 1;
  }
}

void RNNModel()
{
  const size_t rho = 10;

  // Generate 12 (2 * 6) noisy sines. A single sine contains rho
  // points/features.
  arma::mat input, labelsTemp;
  GenerateNoisySines(input, labelsTemp, rho, 6);

  arma::mat labels = arma::zeros<arma::mat>(rho, labelsTemp.n_cols);
  for (size_t i = 0; i < labelsTemp.n_cols; ++i)
  {
    const int value = arma::as_scalar(arma::find(
        arma::max(labelsTemp.col(i)) == labelsTemp.col(i), 1)) + 1;
    labels.col(i).fill(value);
  }

  /**
   * Construct a network with 1 input unit, 4 hidden units and 10 output
   * units. The hidden layer is connected to itself. The network structure
   * looks like:
   *
   *  Input         Hidden        Output
   * Layer(1)      Layer(4)      Layer(10)
   * +-----+       +-----+       +-----+
   * |     |       |     |       |     |
   * |     +------>|     +------>|     |
   * |     |    ..>|     |       |     |
   * +-----+    .  +--+--+       +-----+
   *            .     .
   *            .     .
   *            .......
   */
  Add<> add(4);
  Linear<> lookup(1, 4);
  SigmoidLayer<> sigmoidLayer;
  Linear<> linear(4, 4);
  Recurrent<> recurrent(add, lookup, linear, sigmoidLayer, rho);

  RNN<> model(rho);
  model.Add<IdentityLayer<> >();
  model.Add(recurrent);
  model.Add<Linear<> >(4, 10);
  model.Add<LogSoftMax<> >();

  StandardSGD opt(0.1, 1, input.n_cols /* 1 epoch */, -100);
  model.Train(input, labels, opt);
}
@endcode

For further examples on the usage of the ann classes, see [mlpack
models](https://github.com/mlpack/models).

@section layer_api_anntut Layer API

In order to facilitate consistent implementations, we have defined a LayerType
API that describes all the methods that a \c layer may implement. mlpack offers
a few variations of this API, each designed to cover some of the model
characteristics mentioned in the previous section. Any \c layer requires the
implementation of a \c Forward() method. The interface looks like:

@code
template<typename eT>
void Forward(const arma::Mat<eT>& input, arma::Mat<eT>& output);
@endcode

The method should calculate the output of the layer given the input matrix and
store the result in the given output matrix. Next, any \c layer must implement
the Backward() method, which uses certain computations obtained during the
forward pass and should calculate the function f(x) by propagating x backward
through f:

@code
template<typename eT>
void Backward(const arma::Mat<eT>& input,
              const arma::Mat<eT>& gy,
              arma::Mat<eT>& g);
@endcode

Finally, if the layer is differentiable, the layer must also implement
a Gradient() method:

@code
template<typename eT>
void Gradient(const arma::Mat<eT>& input,
              const arma::Mat<eT>& error,
              arma::Mat<eT>& gradient);
@endcode

The Gradient function should calculate the gradient with respect to the input
activations \c input and calculated errors \c error and place the results into
the gradient matrix object \c gradient that is passed as an argument.

@note
Note that each method accepts a template parameter InputType, OutputType
or GradientType, which may be arma::mat (dense Armadillo matrix) or arma::sp_mat
(sparse Armadillo matrix). This allows support for both sparse-supporting and
non-sparse-supporting \c layer without explicitly passing the type.

In addition, each layer must implement the Parameters(), InputParameter(),
OutputParameter(), Delta() methods, differentiable layer should also provide
access to the gradient by implementing the Gradient(), Parameters() member
function. Note each function is a single line that looks like:

@code
OutputDataType const& Parameters() const { return weights; }
@endcode

Below is an example that shows each function with some additional boilerplate
code.

@note
Note this is not an actual layer but instead an example that exists to show and
document all the functions that mlpack layer must implement.  For a better
overview of the various layers, see \ref mlpack::ann. Also be aware that the
implementations of each of the methods in this example are entirely fake and do
not work; this example exists for its API, not its implementation.

Note that layer sometimes have different properties. These properties are
known at compile-time through the mlpack::ann::LayerTraits class, and some
properties may imply the existence (or non-existence) of certain functions.
Refer to the LayerTraits @ref layer_traits.hpp for more documentation on that.

The two template parameters below must be template parameters to the layer, in
the order given below. More template parameters are fine, but they must come
after the first two.

 - \c InputDataType: this defines the internally used input type for example to
   store the parameter matrix. Note, a layer could be built on a dense matrix or
   a sparse matrix. All mlpack trees should be able to support any Armadillo-
   compatible matrix type.  When the layer is written it should be assumed that
   MatType has the same functionality as arma::mat. Note that

 - \c OutputDataType: this defines the internally used input type for example to
   store the parameter matrix. Note, a layer could be built on a dense matrix or
   a sparse matrix. All mlpack trees should be able to support any Armadillo-
   compatible matrix type.  When the layer is written it should be assumed that
   MatType has the same functionality as arma::mat.

@code
template<typename InputDataType = arma::mat,
         typename OutputDataType = arma::mat>
class ExampleLayer
{
 public:
  ExampleLayer(const size_t inSize, const size_t outSize) :
      inputSize(inSize), outputSize(outSize)
  {
    /* Nothing to do here */
  }
}
@endcode

The constructor for \c ExampleLayer will build the layer given the input and
output size. Note that, if the input or output size information isn't used
internally it's not necessary to provide a specific constructor. Also, one could
add additional or other information that are necessary for the layer
construction. One example could be:

@code
ExampleLayer(const double ratio = 0.5) : ratio(ratio) {/* Nothing to do here*/}
@endcode

When this constructor is finished, the entire layer will be built and is ready
to be used. Next, as pointed out above, each layer has to follow the LayerType
API, so we must implement some additional functions.

@code
template<typename InputType, typename OutputType>
void Forward(const InputType& input, OutputType& output)
{
  output = arma::ones(input.n_rows, input.n_cols);
}

template<typename InputType, typename ErrorType, typename GradientType>
void Backward(const InputType& input, const ErrorType& gy, GradientType& g)
{
  g = arma::zeros(gy.n_rows, gy.n_cols) + gy;
}

template<typename InputType, typename ErrorType, typename GradientType>
void Gradient(const InputType& input,
              ErrorType& error,
              GradientType& gradient)
{
  gradient = arma::zeros(input.n_rows, input.n_cols) * error;
}
@endcode

The three functions \c Forward(), \c Backward() and \c Gradient() (which is
needed for a differentiable layer) contain the main logic of the layer. The
following functions are just to access and manipulate the different layer
parameters.

@code
OutputDataType& Parameters() { return weights; }
InputDataType& InputParameter() { return inputParameter; }
OutputDataType& OutputParameter() { return outputParameter; }
OutputDataType& Delta() { return delta; }
OutputDataType& Gradient() { return gradient; }
@endcode

Since some of this methods return internal class members we have to define them.

@code
private:
  size_t inSize, outSize;
  OutputDataType weights, delta, gradient, outputParameter;
  InputDataType inputParameter;
@endcode

Note some members are just here so \c ExampleLayer compiles without warning.
For instance, \c inputSize is not required to be a member of every type of
layer.

There is one last method that is especially interesting for a layer that shares
parameter. Since the layer weights are set once the complete model is defined,
it's not possible to split the weights during the construction time. To solve
this issue, a layer can implement the \c Reset() method which is called once the
layer parameter is set.

@section model_setup_training_anntut Model Setup & Training

Once the base container is selected (\c FNN or \c RNN), the \c Add method can be
used to add layers to the model.  The code below adds two linear layers to the
model---the first takes 512 units as input and gives 256 output units, and
the second takes 256 units as input and gives 128 output units.

@code
FFN<> model;
model.Add<Linear<> >(512, 256);
model.Add<Linear<> >(256, 128);
@endcode

The model is trained on Armadillo matrices. For training a model, you will
typically use the \c Train() function:

@code
arma::mat trainingSet, trainingLabels;
model.Train(trainingSet, trainingLabels);
@endcode

You can use mlpack's \c Load() function to load a dataset like this:

@code
arma::mat trainingSet;
data::Load("dataset.csv", dataset, true);
@endcode

@code
$ cat dataset.csv
0, 1, 4
1, 0, 5
1, 1, 1
2, 0, 2
@endcode

The type does not necessarily need to be a CSV; it can be any supported storage
format, assuming that it is a coordinate-format file in the format specified
above.  For more information on mlpack file formats, see the documentation for
mlpack::data::Load().

@note
It’s often a good idea to normalize or standardize your data, for example using:

@code
for (size_t i = 0; i < dataset.n_cols; ++i)
  dataset.col(i) /= norm(dataset.col(i), 2);
@endcode

Also, it is possible to retrain a model with new parameters or with
a new reference set. This is functionally equivalent to creating a new model.

@section model_saving_loading_anntut Saving & Loading

Using \c boost::serialization (for more information about the internals see
[Serialization - Boost C++ Libraries](www.boost.org/libs/serialization/doc/)),
mlpack is able to load and save machine learning models with ease. To save a
trained neural network to disk. The example below builds a model on the \c
thyroid dataset and then saves the model to the file \c model.xml for later use.

@code
// Load the training set.
arma::mat dataset;
data::Load("thyroid_train.csv", dataset, true);

// Split the labels from the training set.
arma::mat trainData = dataset.submat(0, 0, dataset.n_rows - 4,
    dataset.n_cols - 1);

// Split the data from the training set.
arma::mat trainLabelsTemp = dataset.submat(dataset.n_rows - 3, 0,
    dataset.n_rows - 1, dataset.n_cols - 1);

// Initialize the network.
FFN<> model;
model.Add<Linear<> >(trainData.n_rows, 3);
model.Add<SigmoidLayer<> >();
model.Add<LogSoftMax<> >();

// Train the model.
model.Train(trainData, trainLabels);

// Use the Predict method to get the assignments.
arma::mat assignments;
model.Predict(trainData, assignments);

data::Save("model.xml", "model", model, false);
@endcode

After this, the file model.xml will be available in the current working
directory.

Now, we can look at the output model file, \c model.xml:

@code
$ cat model.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="15">
<model class_id="0" tracking_level="0" version="0">
  <parameter class_id="1" tracking_level="1" version="0" object_id="_0">
    <n_rows>66</n_rows>
    <n_cols>1</n_cols>
    <n_elem>66</n_elem>
    <vec_state>0</vec_state>
    <item>-7.55971528334903642e+00</item>
    <item>-9.95435955058058930e+00</item>
    <item>9.31133928948225353e+00</item>
    <item>-5.36784434861701953e+00</item>
    ...
  </parameter>
  <width>0</width>
  <height>0</height>
  <currentInput object_id="_1">
    <n_rows>0</n_rows>
    <n_cols>0</n_cols>
    <n_elem>0</n_elem>
    <vec_state>0</vec_state>
  </currentInput>
  <network class_id="2" tracking_level="0" version="0">
    <count>3</count>
    <item_version>0</item_version>
    <item class_id="3" tracking_level="0" version="0">
      <which>18</which>
      <value class_id="4" tracking_level="1" version="0" object_id="_2">
        <inSize>21</inSize>
        <outSize>3</outSize>
      </value>
    </item>
    <item>
      <which>2</which>
      <value class_id="5" tracking_level="1" version="0" object_id="_3"></value>
    </item>
    <item>
      <which>20</which>
      <value class_id="6" tracking_level="1" version="0" object_id="_4"></value>
    </item>
  </network>
</model>
</boost_serialization>
@endcode

As you can see, the \c \<parameter\> section of \c model.xml contains the trained
network weights.  We can see that this section also contains the network input
size, which is 66 rows and 1 column. Note that in this example, we used three
different layers, as can be seen by looking at the \c \<network\> section. Each
node has a unique id that is used to reconstruct the model when loading.

The models can also be saved as \c .bin or \c .txt; the \c .xml format provides
a human-inspectable format (though the models tend to be quite complex and may
be difficult to read). These models can then be re-used to be used for
classification or other tasks.

So, instead of saving or training a network, mlpack can also load a pre-trained
model. For instance, the example below will load the model from \c model.xml and
then generate the class predictions for the \c thyroid test dataset.

@code
data::Load("thyroid_test.csv", dataset, true);

arma::mat testData = dataset.submat(0, 0, dataset.n_rows - 4,
    dataset.n_cols - 1);

data::Load("model.xml", "model", model);

arma::mat predictions;
model.Predict(testData, predictions);
@endcode

This enables the possibility to distribute a model without having to train it
first or simply to save a model for later use. Note that loading will also work
on different machines.

@section extracting_parameters_anntut Extracting Parameters

To access the weights from the neural network layers, you can call the following
function on any initialized network:

@code
model.Parameters();
@endcode

which will return the complete model parameters as an armadillo matrix object;
however often it is useful to not only have the parameters for the complete
network, but the parameters of a specific layer. Another method, \c Model(),
makes this easily possible:

@code
model.Model()[1].Parameters();
@endcode

In the example above, we get the weights of the second layer.

@section further_anntut Further documentation

For further documentation on the ann classes, consult the \ref mlpack::ann
"complete API documentation".

*/
