FAQ

Do you support [feature | encoder | decoder] in Ludwig?

The list of encoders for each feature type is specified in the User Guide. We plan to add additional feature types and additional encoders and decoders for all feature types. Refer to this question for more details. If you want to help us implementing your favourite feature or model please take a look at the Developer Guide to see how to contribute.

Do all datasets need to be loaded in memory?

At the moment it depends on the type of feature: image features can be dynamically loaded from disk from an opened hdf5 file, while other types of features (that usually take need less memory than image ones) are loaded entirely in memory for speed. We plan to add an option to load also other features from disk in future releases and to also support more input file types and more scalable solutions like Petastorm.

My data is on [ GCS | S3 | Azure ], how can I load it?

Ludwig uses Pandas for loading data at the moment (this may change when we move to Petastorm). This means that if your service provides a mechanism for loading data with a name handler, you can load it.

These name handlers already work: - Google Cloud Storage: gs://. You just have to install gcsfs with pip install gcsfs>=0.2.1 and you will be able to prive paths to Ludwig with the gs:// name handler. - Amazon S3: s3://. You just have to install boto with pip install boto and you will be able to prive paths to Ludwig with the s3:// name handler.

What additional features are you working on?

We will prioritize new features depending on the feedback of the community, but we are already planning to add:

  • additional text and sequence encoders (attention, co-attention, hierarchical attention, Transformer, ELMo and BERT derived models, bert is supported already).
  • additional image encoders (DenseNet and FractalNet).
  • image decoding (both image generation by deconvolution and pixel-wise classification for image segmentation).
  • time series decoding.
  • additional features types (point clouds, nested lists, multi-sentence documents, graphs, videos).
  • additional measures and losses.
  • additional data formatters and dataset-specific preprocessing scripts.

We also want to address some of the current limitations:

  • currently the full dataset needs to be loaded in memory in order to train a model. Image features already have a way to dynamically read batches of datapoints from disk, and we want to extend this capability to other datatypes.
  • a simple user interface in order to provide a live demo capability.
  • document lower level functions.
  • optimize the data I/O to TensorFlow.
  • increase the number of supported data formats beyond just CSV and integrating with Petastorm.

All these are opportunities to get involved in the community and contribute. Feel free to reach out to us and ask as there are tasks for all levels of experience.

Who are the authors of Ludwig?

  • Piero Molino is the main architect and maintainer
  • Yaroslav Dudin is a key contributor
  • Sai Sumanth Miryala contributed all the testing, logging and helped polishing.

Who else helped developing Ludwig?

  • Yi Shi who implemented the time series encoding
  • Ankit Jain who implemented the bag feature encoding
  • Pranav Subramani who contributed documentation
  • Alex Sergeev and Felipe Petroski Such who helped with distributed training
  • Emidio Torre helped with the initial design of the landing page

How can I cite Ludwig?

Please use this Bibtex:

@misc{Molino2019,
  author = {Piero Molino and Yaroslav Dudin and Sai Sumanth Miryala},
  title = {Ludwig: a type-based declarative deep learning toolbox},
  year = {2019},
  eprint = {arXiv:1909.07930},
}