Menü

Toolbox

During your Coli studies you encounter many difficulties of theoretical and technical nature. Thankfully there is the world wide web with a multitude of resources which can help you with dealing with these difficulties. This list is an attempt to give an overview over those resources, many of which we already profited of.

(Web) Applications

  • Visual Studio Code is an ingenious extensible open-source editor Microsoft that has all features that you could wish for. Available for all Desktop operating systems and by now even as a web app. Useful to take lecture notes, work on exercises, write code etc.
  • Joplin is an open-source Markdown editor with extras (LaTeX, …), which is also useful to take e.g. lecture notes.
  • FLACI lets you draw automata and work with formal languages, grammars and regular expressions. Very useful for the ECL and Theoretical Computer Science lectures.
  • diagrams.net (formerly draw.io) allows you to draw flow charts, UI mockups, ER and UML diagrams and a lot more (there also is a VSCode plugin).
  • Minimum Edit Distance Calculator with visualisations.
  • Cryptpad to collaboratively edit files (spreadsheets, text, slides, polls, …). Similar to Google Docs or Office Online, but open source and end-to-end encrypted. An account is also not required to use it.
  • Overleaf allows collaborative editing of LaTeX documents (which you will write a lot of during your studies).
  • Deepnote offers Jupyter Notebooks (if you don’t know them, they are explained further down) in the Cloud, allowing collaborative editing (so like Overleaf, just for code).

Templates

We collect (LaTeX) templates for exercise sheet submissions, seminar papers, etc. in this GitLab repository.

Coli Infrastructure

Version Control

Since git is the most used version control system today the Gruppe Technik maintains a GitLab instance. GitLab is similar to GitHub. You can create your own projects and collaborate.

Corpora, Parser etc.

The Wiki page for available resources explains everything necessary. (At least in German.)

Server

Student- and Open-for-students servers:

  • ella.cl.uni-heidelberg.de: Computing servers; 32 Cores, 125 GiB Main memory.
  • last.cl.uni-heidelberg.de: Computing servers; 40 Cores, 504 GiB Main memory.

GT Tutorial

The Gruppe Technik has many interesting facts about infrastructure on their Wiki page.

Coli News

  • Hackernews provides news for hackers. Including machine learning or even Python.
  • /r/Python/ is a subreddit with news about Python, i.e. new versions, new modules or projects realised with Python.
  • /r/MachineLearning/ is all about machine learning, i.e. new algorithms, new implementations or discussions.

Tutorials and Overviews

Programming and Automation

Bash

Even though Shell (and especially Bash) are important for daily data processing and programming tasks, there is no introduction into the field. Private study is recommended.

Computational Linguistics

  • The Natural Language Processing with Python page is a tutorial for working with NLTK.
  • The YouTube channel Sentdex highlights what is possible with Python (Stock market prediction, machine learning, but also NLP.
  • The YouTube channel Sirajalogy reports new research, like Reinforcement Learning or GANs, in an easy to digest format for newcomers.
  • The channel AI Coffee Break with Letitia is also worth checking out. If you like the videos, Letiţia is also teaching seminars and lectures here.

Writing

  • The LaTeX for Linguists page gives an overview over many LaTeX packages relevant to linguistics.
  • A student from Heidelberg provides a good and simple Introduction to LaTeX, for which every lecture and assignment is online. Recommended for working through completely or for quick reference.

Python Libraries

Here are a few interesting libraries, so you don’t have to implement them yourselves.

String Matching

  • fuzzywuzzy: fuzzywuzzy is a library that allows for loose string matching (Levenshtein), but also filtering.

Language Detection

  • langdetect: langdetect is a quick and easy way to find out in what language someone is talking to you.

Statistical Modules

  • statsmodels: statsmodels includes many features for statistical analysis of data, like correlation.
  • sklearn: sklearn implements many machine learning algorithms.
  • scipy: scipy implements features for scientific working, like correlation tests or common similarity measures.
  • sympy: sympy provides a more natural syntax for maths functions in Python which you can then simplify, devise, integrate or evaluate with inserted numbers.
  • pandas: pandas simplifies tables.

Text Processing

  • nltk: nltk includes many text processing features.
  • spacy: spacy implements state of the art algorithms for pos tagging, dependency parsing and NER.
  • textblob: Is you go-to for English text processing, like sentiment analysis.
  • textblob-de: Like textblob, but more German.
  • polyglot: Pos tagging and sentiment analysis in over 100 languages.

Scrapping

  • requests: Simplifies working with HTTP requests.
  • bs4: Easy parsing for XML and HTML.
  • scrapy: Framework for programming scrapers.

Homepages

  • flask: Web framework with many freedoms. Most of all recommended for small projects or out-there use cases.
  • django: Web framework with a harder framework than flask. Useful for bigger projects; especially when using a database.

Deep Learning

  • keras: Framework for easy creation of neural nets. Abstraction layer on top of Tensorflow or Theano.
  • tensorflow: Tensorflow is a platform independent open source library for artificial intelligence close to language processing and image recognition.
  • pytorch: Pytorch is similar to tensorflow with better integration with Python. For loops are really integrated in this one.

Text Extraction

  • textract: Gets texts out of PDFs.

Horny Shit

  • jupyter-notebook: Module to combine code and text and reliable for teaching. Jupyter-notebook is a web based interactive mode of Python, in which, for example, images can be embedded.

Visualisation

  • matplotlib: The classic visualisation library for Python.
  • seaborn: Addon for matplotlib with nice defaults automated integration of panda’s dataframes.