Big data in practice: text analytics (EN/NL/FR)
Startdata en plaatsen
Beschrijving
During this 1 day ABIS course, we'll introduce the most important concepts/terminology related to text analysis and "text mining", like tokens, normalisation, lemmatisation, part-of-speech, language models, text classification, ... It will become clear that automated text analysis is complicated: aspects like language, grammar, spelling mistakes, synonyms, negation, order of words, punctuation marks ... complicate the analysis. This is because text is meant as a communication means between humans, not to be understood by computers.
We will use packages like NLTK toolkit, Apache OpenNLP, and Standford's NLP Suite. The use of regular expressions will be treated.
Remark: Course description in…
Veelgestelde vragen
Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.
During this 1 day ABIS course, we'll introduce the most important concepts/terminology related to text analysis and "text mining", like tokens, normalisation, lemmatisation, part-of-speech, language models, text classification, ... It will become clear that automated text analysis is complicated: aspects like language, grammar, spelling mistakes, synonyms, negation, order of words, punctuation marks ... complicate the analysis. This is because text is meant as a communication means between humans, not to be understood by computers.
We will use packages like NLTK toolkit, Apache OpenNLP, and Standford's NLP Suite. The use of regular expressions will be treated.
Remark: Course description in English; Dutch and French versions are available on the ABIS website. Courses are planned in Dutch, English, and French. Consult the ABIS website for alternate course formats.
Main Topics - Content:
- What is text?
- Building blocks of text: characters and words; grammar; punctuation; word order; language dependencies
- Tokenisation: conceptual and technical; normalisation, a.o. composite words
- Lemmatisation; part-of-speech tagging
- Use of word lists and of corpora
- Syntax and parsing
- Introduction to some popular parsing techniques
- Regular expressions
- Language models
- Statistical models
- "Bag of words"
- TF-IDF (term frequency & inverse document frequency)
- n-grams and frequency distributions
- Natural language processing (NLP)
- overview of the aspects studied by NLP, like semantics, context, similarity, sentiment analysis
- text categorisation; clustering techniques; measures for similarity
- NLP software
- overview of the current state-of-the-art and freely available software toolkits
- practical examples and exercises with one of the toolkits
Audience: This training is intended for those who want to start practising "text analytics".
Background: Some familiarity with statistical concepts (histogram, classification, hypothesis tests). Also, a minimal programming background is helpful.
Didactics: Classroom instruction.
Duration: 1 day.
Blijf op de hoogte van nieuwe ervaringen
Deel je ervaring
Heb je ervaring met deze cursus? Deel je ervaring en help anderen kiezen. Als dank voor de moeite doneert Springest € 1,- aan Stichting Edukans.Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.