









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of natural language processing (nlp), which is the task of analyzing and generating human languages using computers. It covers various aspects of nlp, including lexical, syntactic, semantic, and discourse analysis, as well as challenges and open problems in the field. Applications of nlp, such as information retrieval, document classification, question answering, and machine translation. It also introduces concepts like word sense disambiguation, named entity recognition, and part-of-speech tagging. Topics related to statistical models like hidden markov models (hmm) and naive bayes, as well as lexical knowledge resources like wordnet. Overall, the document presents a comprehensive introduction to the field of natural language processing and its various techniques and applications.
Typology: Summaries
1 / 16
This page cannot be seen from the preview
Don't miss anything!
Chapter 1 and 2 Introduction To NLP Challenges/Open Problems of NLP Characteristics of NLP Application of NLP Word Segmentation Parsing – Parsing Tree, Top down parsing and Bottom up parsing Chunking, NER Sentiment Analysis Web 2.0 application Chapter 3 HMM CRF Naïve Bayes Chapter 4 Pos Tagging – Difficulty Morphology Fundamentals - Types Automatic Morphology Learning, Finite State Machine Based Morphology Shallow Parsing Chapter 5 Dependency Parsing Malt Parser Chapter 6 Lexical Knowledge Networks WordNET Theory Semantic Roles Metaphors; Word Sense – Application
Chapter 1 and 2 Introduction To NLP:
Dialogue-based applications : It involves human-machine communication. Most naturally this involves spoken language, but it also includes interaction using keyboards. Typical potential applications include question-answering systems, where natural language is used to query a database (for example, a query system to a personnel database) automated customer service over the telephone (for example, to perform banking transactions or order items from a catalogue) tutoring systems, where the machine interacts with a student (for example, an automated mathematics tutoring system) spoken language control of a machine (for example, voice control of a VCR or computer) general cooperative problem-solving systems (for example, a system that helps a person plan and schedule freight shipments) The following list is not complete, but useful systems have been built for: spelling and grammar checking optical character recognition (OCR) screen readers for blind and partially sighted users augmentative and alternative communication (i.e., systems to aid people who have difficulty communicating because of disability) machine aided translation (i.e., systems which help a human translator, e.g., by storing translations of phrases and providing online dictionaries integrated with word processors, etc) lexicographers' tools information retrieval document classification (filtering, routing) document clustering information extraction question answering summarization text segmentation exam marking report generation (possibly multilingual) machine translation natural language interfaces to databases email understanding dialogue systems
Some NLP Task There are following NLP Task: Word segmentation Topic segmentation and recognition Part-of-speech tagging Word sense disambiguation Named entity recognition (NER) Parsing Word Segmentation Word segmentation is the problem of dividing a string of written language into its component words. In English and many other languages using some form of the Latin alphabet, the space is a good approximation of a word divider (word delimiter). Parsing – Parsing Tree, Top down parsing and Bottom up parsing What is Parsing? Parsing is the process of taking a string and a grammar and returning a (or multiple) parse tree(s) for that string It is completely analogous to running a finite-state transducer with a tape It’s just more powerful - there are languages we can capture with CFGs that we can’t capture with finite-state machines. Example 1 - John ate the cat A top-down strategy starts with S and searches through different ways to rewrite the symbols until it generates the input sentence (or it fails). Thus S is the start and it proceeds through a series of rewrites until the sentence under consideration is found. S NP VP NAME VP John VP John V NP John are NP John are ART N John ate the N John ate the cat In a bottom-up strategy , one starts with the words of the sentence and used the rewrite rules backward to reduce the sentence symbols until one is left with S. John ate the cat NAME ate the cat NAME V the cat NAME V ART cat
Relative amounts of wasted search depend on how much the grammar branches in each direction Chunking, NER (Named-entity recognition) It is also known as entity identification, entity chunking and entity extraction. Named-entity recognition is the problem of segmenting and classifying proper names, such as names of people and organization, in text. An entity is an individual person, place, or thing in the world, while a mention is a phrase of text that refers to an entity using a proper name. The problem of named-entity recognition is in part one of segmentation because mentions in English are often multi-word. It is a subtask of information extraction that seeks to locate and classify elements in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, such as this one: Example – Jim bought 300 shares of Acme Corp. in 2006. And producing an annotated block of text that highlights the names of entities: [Jim] Person bought 300 shares of [Acme Corp.] Organization in Time. In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.
Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely applied to reviews and social media for a variety of applications, ranging from marketing to customer service. Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. Types of Sentiment Analysis –
The advantage of feature-based sentiment analysis is the possibility to capture nuances about objects of interest. Different features can generate different sentiment responses, for example a hotel can have a convenient location, but mediocre food.
Web 2.0 is the term given to describe a second generation of the World Wide Web that is focused on the ability for people to collaborate and share information online. Web 2.0 basically refers to the transition from static HTML Web pages to a more dynamic Web that is more organized and is based on serving Web applications to users. Web 2.0 is the current state of online technology as it compares to the early days of the Web, characterized by greater user interactivity and collaboration, more pervasive network connectivity and enhanced communication channels. One of the most significant differences between Web 2.0 and the traditional World Wide Web (WWW, retroactively referred to as Web 1.0) is greater collaboration among Internet users, content providers and enterprises. Originally, data was posted on Web sites, and users simply viewed or downloaded the content. Increasingly, users have more input into the nature and scope of Web content and in some cases exert real-time control over it. The foundational components of Web 2.0 are the advances enabled by Ajax and other applications such as RSS and Eclipse and the user empowerment that they support. Application : Trading - Buying, selling or exchanging through user transactions mediated by internet communications Media sharing - Uploading and downloading media files for purposes of audience or exchange Conversational arenas - One-to-one or one-to-many conversations between internet users Online games and virtual worlds - Rule-governed games or themed environments that invite live interaction with other internet users Social networking - Websites that structure social interaction between members who form subgroups of ‘friends’ (Eg. Facebook, Orkut, etc) Blogging - An internet-based journal or diary in which a user can post text and digital material while others can comment Social bookmarking - Users submit their bookmarked web pages to a central site where they can be tagged and found by other users Recommender systems - Websites aggregate and tag user preferences for items in some domain and thereby make novel recommendations Collaborative editing - Web tools are used collaboratively to design, construct and distribute a digital product Wikis - A web-based service allowing users unrestricted access to create, edit and link pages
Alice knows the general weather trends in the area, and what Bob likes to do on average. In other words, the parameters of the HMM are known. They can be represented as follows in Python: states = ('Rainy', 'Sunny') observations = ('walk', 'shop', 'clean') start_probability = {'Rainy': 0.6, 'Sunny': 0.4} transition_probability = { 'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3}, 'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6}, } emission_probability = { 'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5}, 'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1}, } In this piece of code, The start_probability represents Alice's belief about which state the HMM is in when Bob first calls her. The transition_probability represents the change of the weather in the underlying Markov chain. The emission_probability represents how likely Bob is to perform a certain activity on each day. Application of HMM: HMMs can be applied in many fields where the goal is to recover a data sequence that is not immediately observable (but other data that depend on the sequence are). Applications include:
Naive Bayes has been studied extensively since the 1950s. It was introduced under a different name into the text retrieval community in the early 1960s. Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. It is not a single algorithm for training such classifiers, but a family of algorithms based on a common principle: all naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 10 cm in diameter.
POS tagging, this has been reported as 96% (which makes existing POS taggers look impressive). However this raises lots of questions: relatively untrained human annotators working independently often have quite low agreement, but trained annotators discussing results can achieve much higher performance (approaching 100% for POS tagging). Human performance varies considerably between individuals. In any case, human performance may not be a realistic ceiling on relatively unnatural tasks, such as POS tagging. Error analysis The error rate on a particular problem will be distributed very unevenly. For instance, a POS tagger will never confuse the tag PUN with the tag VVN (past participle), but might confuse VVN with AJ0 (adjective) because there's a systematic ambiguity for many forms (e.g., given). For a particular application, some errors 25 may be more important than others. For instance, if one is looking for relatively low frequency cases of demonical verbs (that is verbs derived from nouns. e.g., canoe, tango, fork used as verbs), then POS tagging is not directly useful in general, because a verbal use without a characteristic affix is likely to be massaged. This makes POS-tagging less useful for lexicographers, who are often specifically interested in finding examples of unusual word uses. Similarly, in text categorization, some errors are more important than others: e.g. treating an incoming order for an expensive product as junk email is a much worse error than the converse. Reproducibility If at all possible, evaluation should be done on a generally available corpus so that other researchers can replicate the experiments. Morphology Fundamentals - Types Automatic Morphology Learning, Finite State Machine Based Morphology Shallow Parsing : Shallow parsing is an analysis of a sentence which identifies the constituents (noun groups or phrases, verbs, verb groups, etc.), but does not specify their internal structure, nor their role in the main sentence. It is a technique widely used in natural language processing. It is similar to the concept of lexical analysis for computer languages. Under the name of the Shallow Structure Hypothesis, it is also used as an explanation for why second language learners often fail to parse complex sentences correctly. In this technique, we get hierarchical and grammatical information while preserving robustness and efficiency of the processing. Shallow parsing technique can be seen as a set of production/reduction/cutting rules. · Rule 1: Open a phrase p for the current category c if c can be the left corner of p. Rule 2: Do not open an already opened category if it belongs to the current phrase or is its right corner. Otherwise, we can reopen it if the current word can only be its left corner. Rule 3: Close the opened phrases if the more recently opened phrase can neither neither continue one of them nor be one of their right corners. Rule 4: When closing a phrase, apply rules 1, 2 and 3. This may close or open new phrases taking into consideration all phrase-level categories.
Chapter 5 Dependency Parsing The dependency approach has a number of advantages over full phrase-structure parsing. Deals well with free word order languages where the constituent structure is quite fluid Parsing is much faster than CFG-bases parsers Dependency structure often captures the syntactic relations needed by later applications - CFG-based approaches often extract this same information from trees anyway. Ex. – Malt Parser Chapter 6 Lexical Knowledge Networks WordNET Theory There are several electronic dictionaries, thesauri, lexical databases, and so forth today. WordNet is one of the largest and most widely used of these. It has been used for many natural language processing tasks, including word sense disambiguation and question answering. This is an attempt to explore and understand the structure of WordNet, and how it is used and for what applications it is used, and also to see where it's strength and weakness lies WordNet is the main resource for lexical semantics for English that is used in NLP. Primarily because of its very large coverage and the fact that it's freely available. WordNets are under development for many other languages, though so far none are as extensive as the original.
Metaphors; Word Sense – Application Needed for many applications, problematic for large domains. Assumes that we have a standard set of word senses (e.g., WordNet) frequency: e.g., diet: the food sense (or senses) is much more frequent than the parliament sense (Diet of Wurms) collocations: e.g. striped bass (the _sh) vs bass guitar: syntactically related or in a window of words (latter sometimes called cooccurrence'). Generally
one sense per collocation'. selection restrictions/preferences (e.g., Kim eats bass, must refer to fish A combination of unsupervised Knowledge-based and supervised Machine Learning techniques that will provide a high-precision system that is able to tag running text with word senses A system that acquires a huge number of examples per word from the web The use of sophisticated linguistic information , such as, syntactic relations, semantic classes, selectional restrictions, subcategorization information, domain, etc. Efficient margin-based Machine Learning algorithms. Novel algorithms that combine tagged examples with huge amounts of untagged examples in order to increase the precision of the system.