During your Coli studies you encounter many difficulties of theoretical and technical nature. Thankfully there is the world wide web with a multitude of resources which can help you with dealing with these difficulties.
This list is an attempt to give an overview over those resources, many of which we already profited of.
There are different kinds of resources:
- Extensive introductions into a field, for which you should schedule some time.
- Web tools, to get quick calculations or drawings.
- Compendium-like overviews.
- Software libraries.
The Gruppe Technik also maintains a web surface for the version control system SVN, which isn’t used much.
Corpora, Parser etc.
The Wiki page for available resources explains everything necessary. (At least in German.)
Student- and Open-for-students servers:
- ella.cl.uni-heidelberg.de: Computing servers; 32 Cores, 125 GiB Main memory.
- last.cl.uni-heidelberg.de: Computing servers; 40 Cores, 504 GiB Main memory.
- knopfler.cl.uni-heidelberg.de: Web server; 1 Core, 4 GiB Main memory; only possibility to host web services outwards (contact the GT).
The Gruppe Technik has many interesting facts about infrastructure on their Wiki page (in German).
- Hackernews provides news for hackers. Including machine learning or even Python.
- /r/Python/ is a subreddit with news about Python, i.e. new versions, new modules or projects realised with Python.
- /r/MachineLearning/ is all about machine learning, i.e. new algorithms, new implementations or discussions.
- Web tool for quick drawings of deterministic machines.
- Web tool for designing diagrams like flow charts.
- Minimum Edit Distance Calculator with visualisations.
- Overleaf allows collaborative editing of LaTeX documents (which you will write a lot of during your studies).
Tutorials and Overviews
Programming and Automation
Even though Shell (and especially Bash) are important for daily data processing and programming tasks, there is no introduction into the field (except the resource course). Private study is recommended.
- Introduction to Bash and environment, as well as tools like sed and awk.
- Introduction to Bash with some good tables at the end.
- The Natural Language Processing with Python page is a tutorial for working with NLTK.
- The YouTube channel Sentdex highlights what is possible with Python (Stock market prediction, machine learning, but also NLP.
- The YouTube channel Sirajalogy reports new research, like Reinforcement Learning or GANs, in an easy to digest format for newcomers.
- The LaTeX for Linguists page gives an overview over many LaTeX packages relevant to linguistics.
- A student from Heidelberg provides a good and simple Introduction to LaTeX, for which every lecture and assignment is online. Recommended for working through completely or for quick reference.
Here are a few interesting libraries, so you don’t have to implement them yourselves.
- fuzzywuzzy: fuzzywuzzy is a library that allows for loose string matching (Levenshtein), but also filtering.
- langdetect: langdetect is a quick and easy way to find out in what language someone is talking to you.
- statsmodels: statsmodels includes many features for statistical analysis of data, like correlation.
- sklearn: sklearn implements many machine learning algorithms.
- scipy: scipy implements features for scientific working, like correlation tests or common similarity measures.
- sympy: sympy provides a more natural syntax for maths functions in Python which you can then simplify, devise, integrate or evaluate with inserted numbers.
- pandas: pandas simplifies tables.
- nltk: nltk includes many text processing features.
- spacy: spacy implements state of the art algorithms for pos tagging, dependency parsing and NER.
- textblob: Is you go-to for English text processing, like sentiment analysis.
- textblob-de: Like textblob, but more German.
- polyglot: Pos tagging and sentiment analysis in over 100 languages.
- requests: Simplifies working with HTTP requests.
- bs4: Easy parsing for XML and HTML.
- scrapy: Framework for programming scrapers.
- flask: Web framework with many freedoms. Most of all recommended for small projects or out-there use cases.
- django: Web framework with a harder framework than flask. Useful for bigger projects; especially when using a database.
- keras: Framework for easy creation of neural nets. Abstraction layer on top of Tensorflow or Theano.
- tensorflow: Tensorflow is a platform independent open source library for artificial intelligence close to language processing and image recognition.
- pytorch: Pytorch is similar to tensorflow with better integration with Python. For loops are really integrated in this one.
- textract: Gets texts out of PDFs.
- jupyter-notebook: Module to combine code and text and reliable for teaching. Jupyter-notebook is a web based interactive mode of Python, in which, for example, images can be embedded.
- matplotlib: The classic visualisation library for Python.
- seaborn: Addon for matplotlib with nice defaults automated integration of panda’s dataframes.