The Algorithms logo
算法
关于我们捐赠

使用条件随机场进行命名实体识别

H
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Named Entity Recognition with Conditional Random Fields"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the classic challenges of Natural Language Processing is sequence labelling. In sequence labelling, the goal is to label each word in a text with a word class. In part-of-speech tagging, these word classes are parts of speech, such as noun or verb. In named entity recognition (NER), they're types of generic named entities, such as locations, people or organizations, or more specialized entities, such as diseases or symptoms in the healthcare domain. In this way, sequence labelling can help us extract the most important information from a text and improve the performance of analytics, search or matching applications. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook we'll explore Conditional Random Fields, the most popular approach to sequence labelling before Deep Learning arrived. Deep Learning may get all the attention right now, but Conditional Random Fields are still a powerful tool to build a simple sequence labeller. \n",
    "\n",
    "The tool we're going to use is `sklearn-crfsuite`. This is a wrapper around `python-crfsuite`, which itself is a Python binding of [CRFSuite](http://www.chokkan.org/software/crfsuite/). The reason we're using `sklearn-crfsuite` is that it provides a number of handy utility functions, for example for evaluating the output of the model. You can install it with `pip install sklearn-crfsuite`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we get some data. A well-known data set for training and testing NER models is the CoNLL-2002 data, which has Spanish and Dutch texts labelled with four types of entities: locations (LOC), persons (PER), organizations (ORG) and miscellaneous entities (MISC). Both corpora are split up in three portions: a training portion and two smaller test portions, one of which we'll use as development data. It's easy to collect the data from NLTK. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nltk\n",
    "import sklearn\n",
    "from sklearn.metrics import classification_report, confusion_matrix\n",
    "from sklearn.preprocessing import LabelBinarizer\n",
    "import sklearn_crfsuite as crfsuite\n",
    "from sklearn_crfsuite import metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_sents = list(nltk.corpus.conll2002.iob_sents('ned.train'))\n",
    "dev_sents = list(nltk.corpus.conll2002.iob_sents('ned.testa'))\n",
    "test_sents = list(nltk.corpus.conll2002.iob_sents('ned.testb'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data consists of a list of tokenized sentences. For each of the tokens we have the string itself, its part-of-speech tag and its entity tag, which follows the BIO convention. In the deep learning world we live in today, it's common to ignore the part-of-speech tags. However, since CRFs rely on good feature extraction, we'll gladly make use of this information. After all, the part of speech of a word tells us a lot about its possible status as a named entity: nouns will more often be entities than verbs, for example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('De', 'Art', 'O'),\n",
       " ('tekst', 'N', 'O'),\n",
       " ('van', 'Prep', 'O'),\n",
       " ('het', 'Art', 'O'),\n",
       " ('arrest', 'N', 'O'),\n",
       " ('is', 'V', 'O'),\n",
       " ('nog', 'Adv', 'O'),\n",
       " ('niet', 'Adv', 'O'),\n",
       " ('schriftelijk', 'Adj', 'O'),\n",
       " ('beschikbaar', 'Adj', 'O'),\n",
       " ('maar', 'Conj', 'O'),\n",
       " ('het', 'Art', 'O'),\n",
       " ('bericht', 'N', 'O'),\n",
       " ('werd', 'V', 'O'),\n",
       " ('alvast', 'Adv', 'O'),\n",
       " ('bekendgemaakt', 'V', 'O'),\n",
       " ('door', 'Prep', 'O'),\n",
       " ('een', 'Art', 'O'),\n",
       " ('communicatiebureau', 'N', 'O'),\n",
       " ('dat', 'Conj', 'O'),\n",
       " ('Floralux', 'N', 'B-ORG'),\n",
       " ('inhuurde', 'V', 'O'),\n",
       " ('.', 'Punc', 'O')]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_sents[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feature Extraction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whereas today neural networks are expected to learn the relevant features of the input texts themselves, this is very different with Conditional Random Fields. CRFs learn the relationship between the features we give them and the label of a token in a given context. They're not going to earn these features themselves. Instead, the quality of the model will depend highly on the relevance of the features we show it. \n",
    "\n",
    "The most important method in this tutorial is therefore the one that collects the features for every token. What information could be useful? The word itself, of course, together with its part of speech tag. It can also be interesting to know whether the word is completely uppercase, whether it starts with a capital or is a digit. In addition, we also take a look at the character bigram and trigram the word ends with. We also give every token a `bias` feature, which always has the same value. This bias feature helps the CRF learn the relative frequency of each label type in the training data.\n",
    "\n",
    "To give the CRF more information about the meaning of a word, we also introduce information from word embeddings. In our [Word Embedding notebook](https://github.com/nlptown/nlp-notebooks/blob/master/An%20Introduction%20to%20Word%20Embeddings.ipynb), we trained word embeddings on Dutch Wikipedia and clustered them in 500 clusters. Here we'll read these 500 clusters from a file, and map each word to the id of the cluster it is in. This is really useful for Named Entity Recognition, as most entity types cluster together. This allows CRFs to generalize above the word level. For example, when the CRF encounters a word it has never seen (say, *Albania*), it can base its decision on the cluster the word is in. If this cluster contains many other entities the CRF has met in its training data (say, *Italy*, *Germany* and *France*), it will have learnt a string link between this cluster and a specific entity type. As a result, it can still assign that entity type to the unknown word. In our experiments, this feature alone boosts the performance with around 3%. \n",
    "\n",
    "Finally, apart from the token itself, we also want the CRF to look at its context. More specifically, we're going to give it some extra information about the two words to the left and the right of the targt word. We'll tell the CRF what these words are, whether they start with a capital or are completely uppercase, and give it their part-of-speech tag. If there is no left or right context, we'll inform the CRF that the token is at the beginning or end of the sentence (`BOS` or `EOS`). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_clusters(cluster_file):\n",
    "    word2cluster = {}\n",
    "    with open(cluster_file) as i:\n",
    "        for line in i:\n",
    "            word, cluster = line.strip().split('\\t')\n",
    "            word2cluster[word] = cluster\n",
    "    return word2cluster\n",
    "\n",
    "\n",
    "def word2features(sent, i, word2cluster):\n",
    "    word = sent[i][0]\n",
    "    postag = sent[i][1]\n",
    "    features = [\n",
    "        'bias',\n",
    "        'word.lower=' + word.lower(),\n",
    "        'word[-3:]=' + word[-3:],\n",
    "        'word[-2:]=' + word[-2:],\n",
    "        'word.isupper=%s' % word.isupper(),\n",
    "        'word.istitle=%s' % word.istitle(),\n",
    "        'word.isdigit=%s' % word.isdigit(),\n",
    "        'word.cluster=%s' % word2cluster[word.lower()] if word.lower() in word2cluster else \"0\",\n",
    "        'postag=' + postag\n",
    "    ]\n",
    "    if i > 0:\n",
    "        word1 = sent[i-1][0]\n",
    "        postag1 = sent[i-1][1]\n",
    "        features.extend([\n",
    "            '-1:word.lower=' + word1.lower(),\n",
    "            '-1:word.istitle=%s' % word1.istitle(),\n",
    "            '-1:word.isupper=%s' % word1.isupper(),\n",
    "            '-1:postag=' + postag1\n",
    "        ])\n",
    "    else:\n",
    "        features.append('BOS')\n",
    "\n",
    "    if i > 1: \n",
    "        word2 = sent[i-2][0]\n",
    "        postag2 = sent[i-2][1]\n",
    "        features.extend([\n",
    "            '-2:word.lower=' + word2.lower(),\n",
    "            '-2:word.istitle=%s' % word2.istitle(),\n",
    "            '-2:word.isupper=%s' % word2.isupper(),\n",
    "            '-2:postag=' + postag2\n",
    "        ])        \n",
    "\n",
    "        \n",
    "    if i < len(sent)-1:\n",
    "        word1 = sent[i+1][0]\n",
    "        postag1 = sent[i+1][1]\n",
    "        features.extend([\n",
    "            '+1:word.lower=' + word1.lower(),\n",
    "            '+1:word.istitle=%s' % word1.istitle(),\n",
    "            '+1:word.isupper=%s' % word1.isupper(),\n",
    "            '+1:postag=' + postag1\n",
    "        ])\n",
    "    else:\n",
    "        features.append('EOS')\n",
    "\n",
    "    if i < len(sent)-2:\n",
    "        word2 = sent[i+2][0]\n",
    "        postag2 = sent[i+2][1]\n",
    "        features.extend([\n",
    "            '+2:word.lower=' + word2.lower(),\n",
    "            '+2:word.istitle=%s' % word2.istitle(),\n",
    "            '+2:word.isupper=%s' % word2.isupper(),\n",
    "            '+2:postag=' + postag2\n",
    "        ])\n",
    "\n",
    "        \n",
    "    return features\n",
    "\n",
    "\n",
    "def sent2features(sent, word2cluster):\n",
    "    return [word2features(sent, i, word2cluster) for i in range(len(sent))]\n",
    "\n",
    "def sent2labels(sent):\n",
    "    return [label for token, postag, label in sent]\n",
    "\n",
    "def sent2tokens(sent):\n",
    "    return [token for token, postag, label in sent]\n",
    "\n",
    "word2cluster = read_clusters(\"data/embeddings/clusters_nl.tsv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['bias',\n",
       " 'word.lower=de',\n",
       " 'word[-3:]=De',\n",
       " 'word[-2:]=De',\n",
       " 'word.isupper=False',\n",
       " 'word.istitle=True',\n",
       " 'word.isdigit=False',\n",
       " 'word.cluster=38',\n",
       " 'postag=Art',\n",
       " 'BOS',\n",
       " '+1:word.lower=tekst',\n",
       " '+1:word.istitle=False',\n",
       " '+1:word.isupper=False',\n",
       " '+1:postag=N',\n",
       " '+2:word.lower=van',\n",
       " '+2:word.istitle=False',\n",
       " '+2:word.isupper=False',\n",
       " '+2:postag=Prep']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sent2features(train_sents[0], word2cluster)[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = [sent2features(s, word2cluster) for s in train_sents]\n",
    "y_train = [sent2labels(s) for s in train_sents]\n",
    "\n",
    "X_dev = [sent2features(s, word2cluster) for s in dev_sents]\n",
    "y_dev = [sent2labels(s) for s in dev_sents]\n",
    "\n",
    "X_test = [sent2features(s, word2cluster) for s in test_sents]\n",
    "y_test = [sent2labels(s) for s in test_sents]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now create a CRF model and train it. We'll use the standard [L-BFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) algorithm for our parameter estimation and run it for 100 iterations. When we're done, we save the model with `joblib`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "loading training data to CRFsuite: 100%|██████████| 15806/15806 [00:02<00:00, 7623.17it/s]\n",
      "loading dev data to CRFsuite:  27%|██▋       | 769/2895 [00:00<00:00, 7689.13it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "loading dev data to CRFsuite: 100%|██████████| 2895/2895 [00:00<00:00, 7186.08it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Holdout group: 2\n",
      "\n",
      "Feature generation\n",
      "type: CRF1d\n",
      "feature.minfreq: 0.000000\n",
      "feature.possible_states: 0\n",
      "feature.possible_transitions: 0\n",
      "0....1....2....3....4....5....6....7....8....9....10\n",
      "Number of features: 152117\n",
      "Seconds required: 0.424\n",
      "\n",
      "L-BFGS optimization\n",
      "c1: 0.000000\n",
      "c2: 1.000000\n",
      "num_memories: 6\n",
      "max_iterations: 100\n",
      "epsilon: 0.000010\n",
      "stop: 10\n",
      "delta: 0.000010\n",
      "linesearch: MoreThuente\n",
      "linesearch.max_iterations: 20\n",
      "\n",
      "Iter 1   time=0.37  loss=104214.83 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.00\n",
      "Iter 2   time=0.21  loss=96997.81 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.13\n",
      "Iter 3   time=0.21  loss=92085.38 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.26\n",
      "Iter 4   time=0.21  loss=84277.67 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.51\n",
      "Iter 5   time=0.21  loss=67577.53 active=152117 precision=0.169  recall=0.113  F1=0.109  Acc(item/seq)=0.902 0.496  feature_norm=2.32\n",
      "Iter 6   time=0.21  loss=47854.26 active=152117 precision=0.326  recall=0.347  F1=0.320  Acc(item/seq)=0.930 0.580  feature_norm=4.34\n",
      "Iter 7   time=0.21  loss=43326.19 active=152117 precision=0.340  recall=0.365  F1=0.333  Acc(item/seq)=0.933 0.592  feature_norm=5.06\n",
      "Iter 8   time=0.21  loss=38617.07 active=152117 precision=0.372  recall=0.399  F1=0.362  Acc(item/seq)=0.938 0.618  feature_norm=6.43\n",
      "Iter 9   time=0.21  loss=35511.85 active=152117 precision=0.491  recall=0.442  F1=0.421  Acc(item/seq)=0.942 0.631  feature_norm=8.62\n",
      "Iter 10  time=0.21  loss=32735.31 active=152117 precision=0.535  recall=0.456  F1=0.445  Acc(item/seq)=0.944 0.654  feature_norm=9.50\n",
      "Iter 11  time=0.21  loss=31687.17 active=152117 precision=0.530  recall=0.491  F1=0.502  Acc(item/seq)=0.947 0.663  feature_norm=10.94\n",
      "Iter 12  time=0.21  loss=29323.19 active=152117 precision=0.532  recall=0.482  F1=0.477  Acc(item/seq)=0.946 0.666  feature_norm=11.18\n",
      "Iter 13  time=0.21  loss=28733.55 active=152117 precision=0.596  recall=0.489  F1=0.489  Acc(item/seq)=0.947 0.668  feature_norm=11.58\n",
      "Iter 14  time=0.21  loss=27120.69 active=152117 precision=0.640  recall=0.507  F1=0.512  Acc(item/seq)=0.948 0.677  feature_norm=12.36\n",
      "Iter 15  time=0.21  loss=24849.05 active=152117 precision=0.640  recall=0.547  F1=0.558  Acc(item/seq)=0.952 0.697  feature_norm=13.86\n",
      "Iter 16  time=0.40  loss=24033.40 active=152117 precision=0.654  recall=0.580  F1=0.586  Acc(item/seq)=0.954 0.706  feature_norm=14.53\n",
      "Iter 17  time=0.21  loss=22935.94 active=152117 precision=0.669  recall=0.578  F1=0.598  Acc(item/seq)=0.955 0.712  feature_norm=15.15\n",
      "Iter 18  time=0.21  loss=21803.53 active=152117 precision=0.682  recall=0.584  F1=0.605  Acc(item/seq)=0.956 0.713  feature_norm=15.67\n",
      "Iter 19  time=0.21  loss=21046.75 active=152117 precision=0.725  recall=0.565  F1=0.586  Acc(item/seq)=0.956 0.717  feature_norm=16.04\n",
      "Iter 20  time=0.21  loss=20465.96 active=152117 precision=0.701  recall=0.566  F1=0.594  Acc(item/seq)=0.956 0.711  feature_norm=15.81\n",
      "Iter 21  time=0.21  loss=19991.29 active=152117 precision=0.706  recall=0.586  F1=0.611  Acc(item/seq)=0.957 0.717  feature_norm=15.63\n",
      "Iter 22  time=0.21  loss=19560.23 active=152117 precision=0.684  recall=0.597  F1=0.616  Acc(item/seq)=0.957 0.722  feature_norm=15.58\n",
      "Iter 23  time=0.21  loss=19241.14 active=152117 precision=0.672  recall=0.602  F1=0.615  Acc(item/seq)=0.957 0.726  feature_norm=15.65\n",
      "Iter 24  time=0.21  loss=18787.87 active=152117 precision=0.678  recall=0.627  F1=0.637  Acc(item/seq)=0.958 0.731  feature_norm=16.31\n",
      "Iter 25  time=0.21  loss=18145.13 active=152117 precision=0.690  recall=0.625  F1=0.640  Acc(item/seq)=0.959 0.734  feature_norm=16.94\n",
      "Iter 26  time=0.21  loss=17786.38 active=152117 precision=0.710  recall=0.621  F1=0.642  Acc(item/seq)=0.959 0.738  feature_norm=17.48\n",
      "Iter 27  time=0.21  loss=17247.02 active=152117 precision=0.711  recall=0.625  F1=0.649  Acc(item/seq)=0.959 0.736  feature_norm=18.46\n",
      "Iter 28  time=0.21  loss=16876.01 active=152117 precision=0.737  recall=0.627  F1=0.655  Acc(item/seq)=0.960 0.748  feature_norm=19.83\n",
      "Iter 29  time=0.21  loss=16543.87 active=152117 precision=0.732  recall=0.631  F1=0.663  Acc(item/seq)=0.961 0.751  feature_norm=20.12\n",
      "Iter 30  time=0.21  loss=16263.21 active=152117 precision=0.725  recall=0.644  F1=0.671  Acc(item/seq)=0.962 0.753  feature_norm=20.42\n",
      "Iter 31  time=0.21  loss=15665.78 active=152117 precision=0.715  recall=0.661  F1=0.676  Acc(item/seq)=0.963 0.758  feature_norm=21.50\n",
      "Iter 32  time=0.21  loss=15247.34 active=152117 precision=0.700  recall=0.641  F1=0.650  Acc(item/seq)=0.961 0.748  feature_norm=23.18\n",
      "Iter 33  time=0.21  loss=14866.51 active=152117 precision=0.702  recall=0.658  F1=0.670  Acc(item/seq)=0.963 0.754  feature_norm=24.67\n",
      "Iter 34  time=0.21  loss=14650.96 active=152117 precision=0.704  recall=0.659  F1=0.675  Acc(item/seq)=0.963 0.755  feature_norm=25.38\n",
      "Iter 35  time=0.21  loss=14386.98 active=152117 precision=0.730  recall=0.677  F1=0.695  Acc(item/seq)=0.964 0.761  feature_norm=26.47\n",
      "Iter 36  time=0.21  loss=14158.49 active=152117 precision=0.742  recall=0.681  F1=0.704  Acc(item/seq)=0.965 0.763  feature_norm=28.58\n",
      "Iter 37  time=0.21  loss=13895.30 active=152117 precision=0.736  recall=0.684  F1=0.701  Acc(item/seq)=0.965 0.765  feature_norm=28.72\n",
      "Iter 38  time=0.21  loss=13656.45 active=152117 precision=0.730  recall=0.683  F1=0.695  Acc(item/seq)=0.965 0.768  feature_norm=29.07\n",
      "Iter 39  time=0.21  loss=13499.28 active=152117 precision=0.727  recall=0.680  F1=0.691  Acc(item/seq)=0.965 0.769  feature_norm=29.61\n",
      "Iter 40  time=0.21  loss=13174.95 active=152117 precision=0.726  recall=0.677  F1=0.689  Acc(item/seq)=0.965 0.766  feature_norm=31.03\n",
      "Iter 41  time=0.21  loss=13104.00 active=152117 precision=0.736  recall=0.662  F1=0.678  Acc(item/seq)=0.964 0.760  feature_norm=33.06\n",
      "Iter 42  time=0.21  loss=12750.79 active=152117 precision=0.731  recall=0.685  F1=0.701  Acc(item/seq)=0.966 0.764  feature_norm=34.35\n",
      "Iter 43  time=0.21  loss=12637.02 active=152117 precision=0.740  recall=0.690  F1=0.708  Acc(item/seq)=0.966 0.766  feature_norm=34.63\n",
      "Iter 44  time=0.21  loss=12534.20 active=152117 precision=0.745  recall=0.692  F1=0.712  Acc(item/seq)=0.966 0.766  feature_norm=35.30\n",
      "Iter 45  time=0.21  loss=12390.89 active=152117 precision=0.739  recall=0.682  F1=0.702  Acc(item/seq)=0.966 0.761  feature_norm=37.15\n",
      "Iter 46  time=0.21  loss=12277.42 active=152117 precision=0.733  recall=0.689  F1=0.704  Acc(item/seq)=0.966 0.763  feature_norm=37.58\n",
      "Iter 47  time=0.21  loss=12219.03 active=152117 precision=0.739  recall=0.690  F1=0.706  Acc(item/seq)=0.966 0.767  feature_norm=37.20\n",
      "Iter 48  time=0.21  loss=12125.64 active=152117 precision=0.743  recall=0.695  F1=0.711  Acc(item/seq)=0.967 0.770  feature_norm=36.99\n",
      "Iter 49  time=0.21  loss=11970.65 active=152117 precision=0.745  recall=0.697  F1=0.712  Acc(item/seq)=0.967 0.775  feature_norm=36.84\n",
      "Iter 50  time=0.21  loss=11780.73 active=152117 precision=0.754  recall=0.696  F1=0.718  Acc(item/seq)=0.968 0.777  feature_norm=37.57\n",
      "Iter 51  time=0.21  loss=11623.83 active=152117 precision=0.740  recall=0.691  F1=0.710  Acc(item/seq)=0.967 0.774  feature_norm=38.21\n",
      "Iter 52  time=0.21  loss=11549.38 active=152117 precision=0.740  recall=0.690  F1=0.709  Acc(item/seq)=0.967 0.773  feature_norm=38.94\n",
      "Iter 53  time=0.21  loss=11497.85 active=152117 precision=0.739  recall=0.696  F1=0.713  Acc(item/seq)=0.967 0.772  feature_norm=39.54\n",
      "Iter 54  time=0.21  loss=11419.64 active=152117 precision=0.733  recall=0.692  F1=0.707  Acc(item/seq)=0.966 0.773  feature_norm=40.40\n",
      "Iter 55  time=0.21  loss=11280.29 active=152117 precision=0.744  recall=0.707  F1=0.719  Acc(item/seq)=0.967 0.773  feature_norm=41.75\n",
      "Iter 56  time=0.21  loss=11131.39 active=152117 precision=0.746  recall=0.710  F1=0.722  Acc(item/seq)=0.968 0.773  feature_norm=42.72\n",
      "Iter 57  time=0.21  loss=11043.40 active=152117 precision=0.751  recall=0.713  F1=0.726  Acc(item/seq)=0.968 0.774  feature_norm=42.92\n",
      "Iter 58  time=0.21  loss=10954.38 active=152117 precision=0.769  recall=0.713  F1=0.736  Acc(item/seq)=0.969 0.781  feature_norm=43.18\n",
      "Iter 59  time=0.21  loss=10836.31 active=152117 precision=0.773  recall=0.713  F1=0.736  Acc(item/seq)=0.969 0.781  feature_norm=43.74\n",
      "Iter 60  time=0.21  loss=10712.24 active=152117 precision=0.779  recall=0.719  F1=0.744  Acc(item/seq)=0.970 0.788  feature_norm=44.41\n",
      "Iter 61  time=0.21  loss=10602.81 active=152117 precision=0.789  recall=0.709  F1=0.740  Acc(item/seq)=0.970 0.789  feature_norm=44.68\n",
      "Iter 62  time=0.21  loss=10508.84 active=152117 precision=0.782  recall=0.711  F1=0.739  Acc(item/seq)=0.970 0.787  feature_norm=45.37\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Iter 63  time=0.21  loss=10458.88 active=152117 precision=0.783  recall=0.717  F1=0.744  Acc(item/seq)=0.970 0.788  feature_norm=45.51\n",
      "Iter 64  time=0.21  loss=10420.78 active=152117 precision=0.763  recall=0.711  F1=0.730  Acc(item/seq)=0.969 0.786  feature_norm=45.67\n",
      "Iter 65  time=0.21  loss=10315.28 active=152117 precision=0.766  recall=0.721  F1=0.735  Acc(item/seq)=0.969 0.786  feature_norm=46.30\n",
      "Iter 66  time=0.21  loss=10204.10 active=152117 precision=0.769  recall=0.728  F1=0.740  Acc(item/seq)=0.970 0.786  feature_norm=47.10\n",
      "Iter 67  time=0.21  loss=10134.54 active=152117 precision=0.769  recall=0.716  F1=0.737  Acc(item/seq)=0.970 0.787  feature_norm=47.87\n",
      "Iter 68  time=0.21  loss=10095.10 active=152117 precision=0.773  recall=0.718  F1=0.741  Acc(item/seq)=0.970 0.787  feature_norm=47.85\n",
      "Iter 69  time=0.21  loss=10059.52 active=152117 precision=0.773  recall=0.715  F1=0.738  Acc(item/seq)=0.970 0.790  feature_norm=47.81\n",
      "Iter 70  time=0.21  loss=10012.53 active=152117 precision=0.767  recall=0.712  F1=0.733  Acc(item/seq)=0.970 0.790  feature_norm=47.95\n",
      "Iter 71  time=0.21  loss=9931.41  active=152117 precision=0.765  recall=0.710  F1=0.731  Acc(item/seq)=0.970 0.791  feature_norm=48.49\n",
      "Iter 72  time=0.21  loss=9861.45  active=152117 precision=0.763  recall=0.709  F1=0.730  Acc(item/seq)=0.970 0.793  feature_norm=49.07\n",
      "Iter 73  time=0.21  loss=9803.75  active=152117 precision=0.769  recall=0.723  F1=0.741  Acc(item/seq)=0.970 0.796  feature_norm=49.32\n",
      "Iter 74  time=0.21  loss=9762.61  active=152117 precision=0.758  recall=0.720  F1=0.733  Acc(item/seq)=0.970 0.793  feature_norm=49.58\n",
      "Iter 75  time=0.21  loss=9726.47  active=152117 precision=0.761  recall=0.722  F1=0.736  Acc(item/seq)=0.970 0.790  feature_norm=49.80\n",
      "Iter 76  time=0.21  loss=9628.97  active=152117 precision=0.764  recall=0.722  F1=0.736  Acc(item/seq)=0.970 0.789  feature_norm=50.47\n",
      "Iter 77  time=0.40  loss=9586.61  active=152117 precision=0.763  recall=0.725  F1=0.740  Acc(item/seq)=0.971 0.791  feature_norm=50.96\n",
      "Iter 78  time=0.21  loss=9522.00  active=152117 precision=0.767  recall=0.723  F1=0.741  Acc(item/seq)=0.970 0.788  feature_norm=51.34\n",
      "Iter 79  time=0.21  loss=9479.87  active=152117 precision=0.765  recall=0.720  F1=0.736  Acc(item/seq)=0.970 0.788  feature_norm=51.77\n",
      "Iter 80  time=0.21  loss=9448.33  active=152117 precision=0.771  recall=0.720  F1=0.739  Acc(item/seq)=0.970 0.789  feature_norm=51.80\n",
      "Iter 81  time=0.21  loss=9423.55  active=152117 precision=0.769  recall=0.723  F1=0.740  Acc(item/seq)=0.970 0.789  feature_norm=51.83\n",
      "Iter 82  time=0.21  loss=9367.51  active=152117 precision=0.763  recall=0.720  F1=0.736  Acc(item/seq)=0.970 0.790  feature_norm=52.25\n",
      "Iter 83  time=0.21  loss=9322.98  active=152117 precision=0.762  recall=0.722  F1=0.737  Acc(item/seq)=0.970 0.793  feature_norm=52.34\n",
      "Iter 84  time=0.21  loss=9277.51  active=152117 precision=0.765  recall=0.725  F1=0.740  Acc(item/seq)=0.971 0.798  feature_norm=52.47\n",
      "Iter 85  time=0.21  loss=9229.07  active=152117 precision=0.772  recall=0.726  F1=0.743  Acc(item/seq)=0.971 0.798  feature_norm=52.79\n",
      "Iter 86  time=0.21  loss=9214.50  active=152117 precision=0.782  recall=0.729  F1=0.751  Acc(item/seq)=0.971 0.799  feature_norm=52.88\n",
      "Iter 87  time=0.21  loss=9194.55  active=152117 precision=0.785  recall=0.731  F1=0.753  Acc(item/seq)=0.971 0.800  feature_norm=52.82\n",
      "Iter 88  time=0.21  loss=9182.23  active=152117 precision=0.783  recall=0.728  F1=0.750  Acc(item/seq)=0.971 0.798  feature_norm=52.86\n",
      "Iter 89  time=0.21  loss=9161.54  active=152117 precision=0.786  recall=0.726  F1=0.750  Acc(item/seq)=0.971 0.795  feature_norm=52.98\n",
      "Iter 90  time=0.21  loss=9130.06  active=152117 precision=0.788  recall=0.728  F1=0.752  Acc(item/seq)=0.971 0.797  feature_norm=53.17\n",
      "Iter 91  time=0.21  loss=9063.09  active=152117 precision=0.794  recall=0.738  F1=0.760  Acc(item/seq)=0.972 0.800  feature_norm=53.64\n",
      "Iter 92  time=0.40  loss=9041.98  active=152117 precision=0.791  recall=0.739  F1=0.759  Acc(item/seq)=0.972 0.801  feature_norm=53.77\n",
      "Iter 93  time=0.21  loss=9010.27  active=152117 precision=0.783  recall=0.736  F1=0.754  Acc(item/seq)=0.972 0.801  feature_norm=53.97\n",
      "Iter 94  time=0.21  loss=8984.80  active=152117 precision=0.786  recall=0.740  F1=0.759  Acc(item/seq)=0.972 0.803  feature_norm=54.09\n",
      "Iter 95  time=0.21  loss=8965.14  active=152117 precision=0.788  recall=0.740  F1=0.760  Acc(item/seq)=0.972 0.803  feature_norm=54.14\n",
      "Iter 96  time=0.21  loss=8931.00  active=152117 precision=0.786  recall=0.737  F1=0.758  Acc(item/seq)=0.972 0.804  feature_norm=54.26\n",
      "Iter 97  time=0.21  loss=8923.90  active=152117 precision=0.791  recall=0.737  F1=0.757  Acc(item/seq)=0.971 0.799  feature_norm=54.68\n",
      "Iter 98  time=0.21  loss=8860.78  active=152117 precision=0.784  recall=0.735  F1=0.755  Acc(item/seq)=0.971 0.802  feature_norm=54.64\n",
      "Iter 99  time=0.21  loss=8846.13  active=152117 precision=0.788  recall=0.737  F1=0.758  Acc(item/seq)=0.972 0.803  feature_norm=54.68\n",
      "Iter 100 time=0.21  loss=8832.34  active=152117 precision=0.789  recall=0.737  F1=0.758  Acc(item/seq)=0.972 0.803  feature_norm=54.77\n",
      "================================================\n",
      "Label      Precision    Recall     F1    Support\n",
      "-------  -----------  --------  -----  ---------\n",
      "B-LOC          0.823     0.823  0.823        479\n",
      "B-MISC         0.806     0.651  0.720        748\n",
      "B-ORG          0.853     0.620  0.718        686\n",
      "B-PER          0.752     0.855  0.800        703\n",
      "I-LOC          0.583     0.547  0.565         64\n",
      "I-MISC         0.645     0.507  0.568        215\n",
      "I-ORG          0.806     0.684  0.740        396\n",
      "I-PER          0.844     0.946  0.892        423\n",
      "O              0.990     0.998  0.994      33973\n",
      "------------------------------------------------\n",
      "L-BFGS terminated with the maximum number of iterations\n",
      "Total seconds required for training: 21.768\n",
      "\n",
      "Storing the model\n",
      "Number of active features: 152117 (152117)\n",
      "Number of active attributes: 130306 (145602)\n",
      "Number of active labels: 9 (9)\n",
      "Writing labels\n",
      "Writing attributes\n",
      "Writing feature references for transitions\n",
      "Writing feature references for attributes\n",
      "Seconds required: 0.058\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "CRF(algorithm='lbfgs', all_possible_states=None,\n",
       "  all_possible_transitions=None, averaging=None, c=None, c1=None, c2=None,\n",
       "  calibration_candidates=None, calibration_eta=None,\n",
       "  calibration_max_trials=None, calibration_rate=None,\n",
       "  calibration_samples=None, delta=None, epsilon=None, error_sensitive=None,\n",
       "  gamma=None, keep_tempfiles=None, linesearch=None, max_iterations=100,\n",
       "  max_linesearch=None, min_freq=None, model_filename=None,\n",
       "  num_memories=None, pa_type=None, period=None, trainer_cls=None,\n",
       "  variance=None, verbose='true')"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "crf = crfsuite.CRF(\n",
    "    verbose='true',\n",
    "    algorithm='lbfgs',\n",
    "    max_iterations=100\n",
    ")\n",
    "\n",
    "crf.fit(X_train, y_train, X_dev=X_dev, y_dev=y_dev)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['models/ner/crf_model']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import joblib\n",
    "import os\n",
    "\n",
    "OUTPUT_PATH = \"models/ner/\"\n",
    "OUTPUT_FILE = \"crf_model\"\n",
    "\n",
    "if not os.path.exists(OUTPUT_PATH):\n",
    "    os.mkdir(OUTPUT_PATH)\n",
    "\n",
    "joblib.dump(crf, os.path.join(OUTPUT_PATH, OUTPUT_FILE))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's evaluate the output of our CRF. We'll load the model from the output file above and have it predict labels for the full test set.\n",
    "\n",
    "As a sanity check, let's take a look at its predictions for the first test sentence. This output looks pretty good: the CRF is able to predict all four locations in the sentence correctly. It only misses the person entity, which is a strange case anyway, because it is not actually a person name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sentence: Dat is in Italië , Spanje of Engeland misschien geen probleem , maar volgens ' Der Kaiser ' in Duitsland wel .\n",
      "Predicted: O O O B-LOC O B-LOC O B-LOC O O O O O O O B-MISC I-MISC O O B-LOC O O\n",
      "Correct:   O O O B-LOC O B-LOC O B-LOC O O O O O O O B-PER I-PER O O B-LOC O O\n"
     ]
    }
   ],
   "source": [
    "crf = joblib.load(os.path.join(OUTPUT_PATH, OUTPUT_FILE))\n",
    "y_pred = crf.predict(X_test)\n",
    "\n",
    "example_sent = test_sents[0]\n",
    "\n",
    "print(\"Sentence:\", ' '.join(sent2tokens(example_sent)))\n",
    "print(\"Predicted:\", ' '.join(crf.predict([sent2features(example_sent, word2cluster)])[0]))\n",
    "print(\"Correct:  \", ' '.join(sent2labels(example_sent)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we evaluate on the full test set. We'll print out a classification report for all labels except `O`. If we were to include `O`, which far outnumbers the entity labels in our data, the average scores would be inflated artificially, simply because there's an inherently high probability that the `O` labels from our CRF are correct. We obtain an average F-score of 77% (micro average) across all entity types, with particularly good results for `B-LOC`and `B-PER`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              precision    recall  f1-score   support\n",
      "\n",
      "       B-LOC       0.83      0.83      0.83       774\n",
      "       I-LOC       0.29      0.41      0.34        49\n",
      "      B-MISC       0.84      0.61      0.71      1187\n",
      "      I-MISC       0.59      0.42      0.49       410\n",
      "       B-ORG       0.80      0.69      0.74       882\n",
      "       I-ORG       0.74      0.66      0.70       551\n",
      "       B-PER       0.80      0.90      0.85      1098\n",
      "       I-PER       0.87      0.95      0.91       807\n",
      "\n",
      "   micro avg       0.80      0.74      0.77      5758\n",
      "   macro avg       0.72      0.68      0.70      5758\n",
      "weighted avg       0.80      0.74      0.76      5758\n",
      "\n"
     ]
    }
   ],
   "source": [
    "labels = list(crf.classes_)\n",
    "labels.remove(\"O\")\n",
    "y_pred = crf.predict(X_test)\n",
    "sorted_labels = sorted(\n",
    "    labels,\n",
    "    key=lambda name: (name[1:], name[0])\n",
    ")\n",
    "\n",
    "print(metrics.flat_classification_report(y_test, y_pred, labels=sorted_labels))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can also look at the most likely transitions the CRF has identified, and at the top features for every label. We'll do this with the `eli5` library, which helps us explain the predictions of machine learning models.\n",
    "\n",
    "The top transitions are quite intuitive: the most likely transitions are those within the same entity type (from a B-label to an O-label), and those where a B-label follows an O-label. \n",
    "\n",
    "The features, too, make sense. For example, if a word does not start with an uppercase letter, it is unlikely to be an entity. By contrast, a word is very likely to be a location if it ends in `ië`, which is indeed a very common suffix for locations in Dutch. Notice also how informative the embedding clusters are: for all entity types, the word clusters form some of the most informative features for the CRF. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "<table class=\"eli5-transition-features\" style=\"margin-bottom: 0.5em;\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <td>From \\ To</td>\n",
       "            \n",
       "                <th>O</th>\n",
       "            \n",
       "                <th>B-LOC</th>\n",
       "            \n",
       "                <th>I-LOC</th>\n",
       "            \n",
       "                <th>B-MISC</th>\n",
       "            \n",
       "                <th>I-MISC</th>\n",
       "            \n",
       "                <th>B-ORG</th>\n",
       "            \n",
       "                <th>I-ORG</th>\n",
       "            \n",
       "                <th>B-PER</th>\n",
       "            \n",
       "                <th>I-PER</th>\n",
       "            \n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        \n",
       "            \n",
       "                <tr>\n",
       "                    <th>O</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 88.09%)\" title=\"O &rArr; O\">\n",
       "                            4.141\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 87.21%)\" title=\"O &rArr; B-LOC\">\n",
       "                            4.583\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"O &rArr; I-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 88.09%)\" title=\"O &rArr; B-MISC\">\n",
       "                            4.141\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"O &rArr; I-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 87.64%)\" title=\"O &rArr; B-ORG\">\n",
       "                            4.366\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"O &rArr; I-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 88.74%)\" title=\"O &rArr; B-PER\">\n",
       "                            3.819\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"O &rArr; I-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>B-LOC</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 98.34%)\" title=\"B-LOC &rArr; O\">\n",
       "                            -0.248\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 98.20%)\" title=\"B-LOC &rArr; B-LOC\">\n",
       "                            -0.279\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 82.62%)\" title=\"B-LOC &rArr; I-LOC\">\n",
       "                            7.101\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-LOC &rArr; B-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-LOC &rArr; I-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-LOC &rArr; B-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-LOC &rArr; I-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 96.70%)\" title=\"B-LOC &rArr; B-PER\">\n",
       "                            -0.661\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-LOC &rArr; I-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>I-LOC</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 95.40%)\" title=\"I-LOC &rArr; O\">\n",
       "                            -1.062\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 98.40%)\" title=\"I-LOC &rArr; B-LOC\">\n",
       "                            -0.235\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 84.62%)\" title=\"I-LOC &rArr; I-LOC\">\n",
       "                            5.967\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-LOC &rArr; B-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-LOC &rArr; I-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-LOC &rArr; B-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-LOC &rArr; I-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-LOC &rArr; B-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-LOC &rArr; I-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>B-MISC</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 95.64%)\" title=\"B-MISC &rArr; O\">\n",
       "                            -0.985\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 96.73%)\" title=\"B-MISC &rArr; B-LOC\">\n",
       "                            0.655\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-MISC &rArr; I-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 98.03%)\" title=\"B-MISC &rArr; B-MISC\">\n",
       "                            -0.316\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 81.56%)\" title=\"B-MISC &rArr; I-MISC\">\n",
       "                            7.73\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 97.10%)\" title=\"B-MISC &rArr; B-ORG\">\n",
       "                            0.551\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-MISC &rArr; I-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 97.44%)\" title=\"B-MISC &rArr; B-PER\">\n",
       "                            0.46\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-MISC &rArr; I-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>I-MISC</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 93.40%)\" title=\"I-MISC &rArr; O\">\n",
       "                            -1.781\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-MISC &rArr; B-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-MISC &rArr; I-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 97.75%)\" title=\"I-MISC &rArr; B-MISC\">\n",
       "                            -0.382\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 81.49%)\" title=\"I-MISC &rArr; I-MISC\">\n",
       "                            7.769\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 95.16%)\" title=\"I-MISC &rArr; B-ORG\">\n",
       "                            1.145\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-MISC &rArr; I-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 96.50%)\" title=\"I-MISC &rArr; B-PER\">\n",
       "                            -0.719\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-MISC &rArr; I-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>B-ORG</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 98.28%)\" title=\"B-ORG &rArr; O\">\n",
       "                            -0.261\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-ORG &rArr; B-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-ORG &rArr; I-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 96.20%)\" title=\"B-ORG &rArr; B-MISC\">\n",
       "                            -0.809\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-ORG &rArr; I-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-ORG &rArr; B-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 81.44%)\" title=\"B-ORG &rArr; I-ORG\">\n",
       "                            7.803\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 99.09%)\" title=\"B-ORG &rArr; B-PER\">\n",
       "                            0.106\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-ORG &rArr; I-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>I-ORG</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 96.25%)\" title=\"I-ORG &rArr; O\">\n",
       "                            -0.794\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-ORG &rArr; B-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-ORG &rArr; I-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-ORG &rArr; B-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-ORG &rArr; I-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-ORG &rArr; B-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 82.50%)\" title=\"I-ORG &rArr; I-ORG\">\n",
       "                            7.174\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 99.22%)\" title=\"I-ORG &rArr; B-PER\">\n",
       "                            0.084\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-ORG &rArr; I-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>B-PER</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 98.06%)\" title=\"B-PER &rArr; O\">\n",
       "                            0.31\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 97.90%)\" title=\"B-PER &rArr; B-LOC\">\n",
       "                            -0.346\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-PER &rArr; I-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 96.88%)\" title=\"B-PER &rArr; B-MISC\">\n",
       "                            -0.611\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-PER &rArr; I-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-PER &rArr; B-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"B-PER &rArr; I-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 94.40%)\" title=\"B-PER &rArr; B-PER\">\n",
       "                            -1.408\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 80.00%)\" title=\"B-PER &rArr; I-PER\">\n",
       "                            8.68\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "                <tr>\n",
       "                    <th>I-PER</th>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 99.10%)\" title=\"I-PER &rArr; O\">\n",
       "                            0.104\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-PER &rArr; B-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-PER &rArr; I-LOC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-PER &rArr; B-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-PER &rArr; I-MISC\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-PER &rArr; B-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-PER &rArr; I-ORG\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(0, 100.00%, 100.00%)\" title=\"I-PER &rArr; B-PER\">\n",
       "                            0.0\n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"background-color: hsl(120, 100.00%, 83.13%)\" title=\"I-PER &rArr; I-PER\">\n",
       "                            6.804\n",
       "                        </td>\n",
       "                    \n",
       "                </tr>\n",
       "            \n",
       "        \n",
       "    </tbody>\n",
       "</table>\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "        <table class=\"eli5-weights-wrapper\" style=\"border-collapse: collapse; border: none; margin-bottom: 1.5em;\">\n",
       "            <tr>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=O\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=B-LOC\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=I-LOC\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=B-MISC\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=I-MISC\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=B-ORG\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=I-ORG\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=B-PER\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "                    <td style=\"padding: 0.5em; border: 1px solid black; text-align: center;\">\n",
       "                        <b>\n",
       "    \n",
       "        y=I-PER\n",
       "    \n",
       "</b>\n",
       "\n",
       "top features\n",
       "                    </td>\n",
       "                \n",
       "            </tr>\n",
       "            <tr>\n",
       "                \n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.692\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.istitle=False\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 81.61%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.312\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.isupper=False\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.14%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.211\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=&quot;\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.08%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.999\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=+\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.84%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.835\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        EOS\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.90%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.820\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        BOS\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.28%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.740\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=158\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.54%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.684\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=415\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.99%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.590\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ag\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.01%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.588\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=195\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.556\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=dag\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.34%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.520\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=185\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.34%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.520\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=178\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.53%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.481\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=24\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.55%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.477\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=177\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.56%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.475\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=u\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.56%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 108173 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 89.36%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 6671 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 89.36%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.515\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=375\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 89.25%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.538\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=ronde\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 89.14%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.561\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:word.lower=(\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 88.86%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.618\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        0\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 88.50%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.693\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=de\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 88.28%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.740\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=N\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.96%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.808\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=&#x27;s\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.80%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.843\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ple\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.75%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.853\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=Misc\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.47%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.914\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=6\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.33%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.945\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=1\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.13%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.990\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=111\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 86.78%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -2.068\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.isupper=True\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 85.12%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -2.447\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.istitle=True\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.734\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=325\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.31%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.650\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=375\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 81.01%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.466\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=68\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 82.17%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.169\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=139\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.99%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.020\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=in\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.10%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.995\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=143\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.20%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.973\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=476\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.87%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.828\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=102\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.87%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.617\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ië\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.25%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.538\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=(\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.36%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.516\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=uit\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.46%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.495\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=154\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.441\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:word.lower=/\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.85%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.417\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=-\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.91%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.404\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=tel.\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.04%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.379\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=het\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.07%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.374\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=440\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.09%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.369\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=sint-michiels\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.25%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.337\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=urs\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.34%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.320\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=futuroscope\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.301\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ope\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.301\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=363\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.56%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.278\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=vst\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.56%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.278\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=VSt\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.67%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.257\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=den\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.245\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=146\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.244\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=116\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.90%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.212\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=St\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.98%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.198\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=492\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.98%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 4441 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.33%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 668 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.33%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.322\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -2:word.lower=ronde\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.10%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.778\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=238\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.50%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.488\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=m\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.10%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.367\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=161\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.07%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.996\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=col\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.17%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.977\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=rk\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.89%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.852\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.istitle=False\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.07%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.821\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=al\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.14%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.810\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=york\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.14%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.810\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ork\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.14%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.809\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=eum\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.31%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.781\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=san\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.79%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.703\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=abeba\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.79%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.703\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=addis\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.79%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.702\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=eba\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.79%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.701\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.isupper=True\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.91%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.683\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -2:word.lower=&#x27;\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.92%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.682\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=,\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.95%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.677\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=den\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.95%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.676\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=aan\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.97%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.674\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=um\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.99%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.670\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=staten\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.669\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=38\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.05%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.661\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=Prep\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.12%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.650\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -2:word.lower=col\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.12%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.650\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Art\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.13%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.648\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=museum\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.15%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.645\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ba\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.643\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=hotel\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.16%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 1294 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.97%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 131 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.97%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.674\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        EOS\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.31%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.781\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=Adj\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 81.59%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.318\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=23\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 84.92%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.494\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=100\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 85.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.419\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=39\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.65%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.097\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=1\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.91%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.039\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=294\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.92%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.036\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=338\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.06%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.786\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=&#x27;s\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.13%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.772\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=11\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.47%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.700\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=sport\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.646\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=buitenland\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.81%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.628\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=se\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.86%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.618\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=222\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.27%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.535\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=427\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.41%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.505\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=journaal\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.54%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.478\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=40\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.85%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.417\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=nse\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.88%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.411\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=Adj\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.06%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.374\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=111\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.19%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.350\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=aula\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.20%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.347\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=tobin-taks\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.30%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.329\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=218\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.301\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=128\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.58%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.274\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ula\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.84%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.223\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        BOS\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.90%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.213\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=tobin-heffing\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.94%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.204\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=ronde\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.94%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.204\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ks\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.98%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.197\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=v-plan\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.05%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.184\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ex\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.05%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 5588 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.67%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 853 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.67%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.256\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.isupper=True\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.04%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.792\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -2:word.lower=ronde\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.54%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.684\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.isupper=True\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.11%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.567\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=ronde\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.28%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.332\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=37\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.32%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.323\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:word.lower=ned\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.36%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.316\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=325\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.45%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.298\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=1\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.58%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.274\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=leven\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.89%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.215\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.istitle=True\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.96%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.201\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Num\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.06%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.182\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=euro\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.17%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.161\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=86\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.147\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=bijsluiter\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.48%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.103\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=financiële\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.50%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.100\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=ned\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.52%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.096\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ap\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.67%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.068\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=Num\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.82%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.042\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=413\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.89%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.029\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=00\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.98%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.012\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=de\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.99%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.010\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Prep\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.979\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=279\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.31%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.954\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=travel\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.33%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.950\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=formule\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.34%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.947\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=financiële\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.36%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.945\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=brussel\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.52%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.916\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:word.lower=&#x27;\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.52%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 3172 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.42%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 465 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.42%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.115\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=V\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 89.60%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.467\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:postag=Num\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 83.03%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -2.953\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Adj\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 83.66%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.798\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=424\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 84.33%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.635\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=228\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.54%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.121\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=com\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.12%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.991\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=187\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.20%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.974\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=quizpeople\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.922\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ple\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.78%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.848\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:word.lower=morgen\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.01%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.798\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=250\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.55%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.683\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=83\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.14%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.560\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=29\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.25%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.538\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ga\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.31%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.525\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        BOS\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.500\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=bel\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.55%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.476\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -2:word.lower=minister\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.82%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.422\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=207\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.85%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.418\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=249\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.87%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.412\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=waterleau\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.88%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.411\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.isupper=True\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.02%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.384\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=belga\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.02%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.384\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=lga\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.30%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.328\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=our\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.34%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.320\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=om\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.35%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.319\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=to\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.45%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.299\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=freenet\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.56%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.277\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=baan\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.76%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.240\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=bij\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.82%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.227\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=lux\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.82%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.227\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=co\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.85%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.222\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=21\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.85%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 3591 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.20%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 469 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.20%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.156\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.isupper=False\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.25%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.338\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=morgen\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.42%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.304\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=403\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.74%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.243\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=413\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.96%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.200\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=vlaams\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.28%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.141\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=187\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.39%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.120\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=gen\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.49%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.101\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ion\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.057\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=radio\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.89%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.028\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=321\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.22%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.970\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=ned\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.91%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.849\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=nt\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.95%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.841\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=143\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.17%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.805\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -2:word.lower=voor\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.39%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.767\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=375\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.40%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.767\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Misc\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.42%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.763\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=es\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.760\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=3\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.760\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=3\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.760\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=3\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.53%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.745\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=478\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.60%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.733\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=411\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.69%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.719\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ey\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.77%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.705\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ola\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.79%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.702\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=le\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.79%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 2209 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.33%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 250 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.33%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.777\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -2:postag=V\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.793\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=in\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.06%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.824\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=financiële\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 92.95%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.842\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Num\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.55%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.090\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        0\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.162\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=V\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.80%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +3.523\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=489\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 83.29%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.888\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=204\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 83.58%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.818\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=301\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 83.63%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.804\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=3\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 83.79%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.765\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=246\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 85.13%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.444\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=337\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 85.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.419\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=6\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 85.49%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.361\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=326\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.19%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.199\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=296\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.77%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +2.069\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=87\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.20%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.975\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=349\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.34%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.727\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=12\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.07%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.575\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=105\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.10%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.569\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=350\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.69%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.448\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=bode\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.74%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.439\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=85\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.20%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.347\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ov\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.340\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:word.lower=(\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.27%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.333\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=-\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.34%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.320\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=111\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.41%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.307\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=--\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.301\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ff\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.300\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=V\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.71%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.248\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=án\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.72%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.247\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=volgens\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.80%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.231\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=par\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.80%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 7464 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.19%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 851 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.19%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.349\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ing\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.14%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.360\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ur\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 88.90%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.610\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=in\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.37%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.937\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Art\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                        <td style=\"padding: 0px; border: 1px solid black; vertical-align: top;\">\n",
       "                            \n",
       "                                \n",
       "                                    \n",
       "                                    \n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; width: 100%;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature weights. Note that weights do not account for feature value scales, so if feature values have different scales, features with highest weights might not be the most important.\">\n",
       "                    Weight<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 88.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.748\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=van\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.81%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.425\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=3\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.38%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.313\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=388\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.51%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.287\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=450\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.70%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.250\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=6\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.80%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.231\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=249\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.22%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.152\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +1:word.lower=(\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.35%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.127\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        +2:word.lower=die\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.80%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.044\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=gucht\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.21%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.970\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=337\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.46%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.927\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=pu\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.46%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.927\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=nfu\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.46%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.927\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=shenfu\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.47%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.925\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=fu\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.49%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.921\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=296\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.64%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.895\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=grauwe\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.76%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.875\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=uwe\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.79%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.869\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=we\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.80%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.867\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:word.lower=de\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.95%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.841\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ck\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.11%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.815\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        -1:postag=Pron\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.11%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.814\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.lower=den\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 93.11%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 4971 more positive &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.11%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 548 more negative &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.11%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.814\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=ma\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.05%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.824\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=aan\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 92.90%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.851\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=al\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 92.26%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.961\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.cluster=238\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.87%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.032\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word.istitle=False\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.74%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.055\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        postag=V\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.65%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.071\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-3:]=ter\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.61%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.079\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        word[-2:]=um\n",
       "    </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "                                \n",
       "                            \n",
       "                        </td>\n",
       "                    \n",
       "                \n",
       "            </tr>\n",
       "        </table>\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import eli5\n",
    "\n",
    "eli5.show_weights(crf, top=30)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Finding the optimal hyperparameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far we've trained a model with the default parameters. It's unlikely that these will give us the best performance possible. Therefore we're going to search automatically for the best hyperparameter settings by iteratively training different models and evaluating them. Eventually we'll pick the best one.\n",
    "\n",
    "Here we'll focus on two parameters: `c1` and `c2`. These are the parameters for L1 and L2 regularization, respectively. Regularization prevents overfitting on the training data by adding a penalty to the loss function. In L1 regularization, this penalty is the sum of the absolute values of the weights; in L2 regularization, it is the sum of the squared weights. L1 regularization performs a type of feature selection, as it assigns 0 weight to irrelevant features. L2 regularization, by contrast, makes the weight of irrelevant features small, but not necessarily zero. L1 regularization is often called the Lasso method, L2 is called the Ridge method, and the linear combination of both is called Elastic Net regularization.\n",
    "\n",
    "We define the parameter space for c1 and c2 and use the flat F1-score to compare the individual models. We'll rely on three-fold cross validation to score each of the 50 candidates. We use a randomized search, which means we're not going to try out all specified parameter settings, but instead, we'll let the process sample randomly from the distributions we've specified in the parameter space. It will do this 50 (`n_iter`) times. This process takes a while, but it's worth the wait."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 3 folds for each of 50 candidates, totalling 150 fits\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.\n",
      "[Parallel(n_jobs=-1)]: Done  18 tasks      | elapsed:  3.2min\n",
      "[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 24.3min finished\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "RandomizedSearchCV(cv=3, error_score='raise-deprecating',\n",
       "          estimator=CRF(algorithm='lbfgs', all_possible_states=None,\n",
       "  all_possible_transitions=True, averaging=None, c=None, c1=None, c2=None,\n",
       "  calibration_candidates=None, calibration_eta=None,\n",
       "  calibration_max_trials=None, calibration_rate=None,\n",
       "  calibration_samples=None, delta=None, epsilon=None, error...e,\n",
       "  num_memories=None, pa_type=None, period=None, trainer_cls=None,\n",
       "  variance=None, verbose=False),\n",
       "          fit_params=None, iid='warn', n_iter=50, n_jobs=-1,\n",
       "          param_distributions={'c1': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9947f04e10>, 'c2': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9947f04c88>},\n",
       "          pre_dispatch='2*n_jobs', random_state=None, refit=True,\n",
       "          return_train_score='warn',\n",
       "          scoring=make_scorer(flat_f1_score, average=weighted, labels=['B-ORG', 'B-MISC', 'B-PER', 'I-PER', 'B-LOC', 'I-MISC', 'I-ORG', 'I-LOC']),\n",
       "          verbose=1)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import scipy\n",
    "from sklearn.metrics import make_scorer\n",
    "from sklearn.model_selection import RandomizedSearchCV\n",
    "\n",
    "crf = crfsuite.CRF(\n",
    "    algorithm='lbfgs',\n",
    "    max_iterations=100,\n",
    "    all_possible_transitions=True\n",
    ")\n",
    "\n",
    "params_space = {\n",
    "    'c1': scipy.stats.expon(scale=0.5),\n",
    "    'c2': scipy.stats.expon(scale=0.05),\n",
    "}\n",
    "\n",
    "f1_scorer = make_scorer(metrics.flat_f1_score,\n",
    "                        average='weighted', labels=labels)\n",
    "\n",
    "rs = RandomizedSearchCV(crf, params_space,\n",
    "                        cv=3,\n",
    "                        verbose=1,\n",
    "                        n_jobs=-1,\n",
    "                        n_iter=50,\n",
    "                        scoring=f1_scorer)\n",
    "rs.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at the best hyperparameter settings. Our random search suggests a combination of L1 and L2 normalization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "best params: {'c1': 0.08869645933566639, 'c2': 0.005642379370340676}\n",
      "best CV score: 0.7608794798691931\n",
      "model size: 1.06M\n"
     ]
    }
   ],
   "source": [
    "print('best params:', rs.best_params_)\n",
    "print('best CV score:', rs.best_score_)\n",
    "print('model size: {:0.2f}M'.format(rs.best_estimator_.size_ / 1000000))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To find out what precision, recall and F1-score this translates to, we take the best estimator from our random search and evaluate it on the test set. This indeed shows a nice improvement from our initial model. We've gone from an average F1-score of 77% to 79.1%. Both precision and recall have improved, and we see a positive result for all four entity types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              precision    recall  f1-score   support\n",
      "\n",
      "       B-LOC      0.849     0.863     0.856       774\n",
      "       I-LOC      0.359     0.571     0.441        49\n",
      "      B-MISC      0.847     0.622     0.717      1187\n",
      "      I-MISC      0.664     0.415     0.511       410\n",
      "       B-ORG      0.806     0.727     0.764       882\n",
      "       I-ORG      0.772     0.677     0.721       551\n",
      "       B-PER      0.834     0.903     0.867      1098\n",
      "       I-PER      0.892     0.958     0.924       807\n",
      "\n",
      "   micro avg      0.823     0.761     0.791      5758\n",
      "   macro avg      0.753     0.717     0.725      5758\n",
      "weighted avg      0.820     0.761     0.784      5758\n",
      "\n"
     ]
    }
   ],
   "source": [
    "best_crf = rs.best_estimator_\n",
    "y_pred = best_crf.predict(X_test)\n",
    "print(metrics.flat_classification_report(\n",
    "    y_test, y_pred, labels=sorted_labels, digits=3\n",
    "))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Conditional Random Fields have lost some of their popularity since the advent of neural-network models. Still, they can be very effective for named entity recognition, particularly when word embedding information is taken into account.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
关于此算法

使用条件随机场进行命名实体识别

自然语言处理中的一个经典挑战是序列标注。在序列标注中,目标是为文本中的每个单词标记一个词类。在词性标注中,这些词类是词性,例如名词或动词。在命名实体识别(NER)中,它们是通用命名实体的类型,例如位置、人物或组织,或更专业的实体,例如医疗保健领域中的疾病或症状。通过这种方式,序列标注可以帮助我们从文本中提取最重要的信息,并提高分析、搜索或匹配应用程序的性能。

在本笔记本中,我们将探索条件随机场,这在深度学习出现之前是最流行的序列标注方法。深度学习现在可能吸引了所有人的注意力,但条件随机场仍然是构建简单序列标注器的强大工具。

我们将使用 sklearn-crfsuite 工具。这是一个围绕 python-crfsuite 的包装器,而 python-crfsuite 本身是 CRFSuite 的 Python 绑定。我们使用 sklearn-crfsuite 的原因是它提供了一些方便的实用函数,例如用于评估模型的输出。您可以使用 pip install sklearn-crfsuite 安装它。

数据

首先,我们获取一些数据。一个著名的用于训练和测试 NER 模型的数据集是 CoNLL-2002 数据,它包含带有四种实体类型标签的西班牙语和荷兰语文本:位置(LOC)、人物(PER)、组织(ORG)和杂项实体(MISC)。这两个语料库都分为三个部分:一个训练部分和两个较小的测试部分,其中一个我们将用作开发数据。从 NLTK 收集数据很容易。

import nltk
import sklearn
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelBinarizer
import sklearn_crfsuite as crfsuite
from sklearn_crfsuite import metrics
train_sents = list(nltk.corpus.conll2002.iob_sents('ned.train'))
dev_sents = list(nltk.corpus.conll2002.iob_sents('ned.testa'))
test_sents = list(nltk.corpus.conll2002.iob_sents('ned.testb'))

数据包含一个标记化句子的列表。对于每个标记,我们都有它本身的字符串、它的词性标签和它的实体标签,它遵循 BIO 约定。在我们今天所处的深度学习世界中,忽略词性标签很常见。但是,由于 CRF 依赖于良好的特征提取,我们将很乐意利用这些信息。毕竟,单词的词性告诉我们很多关于它作为命名实体的可能状态的信息:例如,名词比动词更有可能是实体。

train_sents[0]
[(&#x27;De&#x27;, &#x27;Art&#x27;, &#x27;O&#x27;),
 (&#x27;tekst&#x27;, &#x27;N&#x27;, &#x27;O&#x27;),
 (&#x27;van&#x27;, &#x27;Prep&#x27;, &#x27;O&#x27;),
 (&#x27;het&#x27;, &#x27;Art&#x27;, &#x27;O&#x27;),
 (&#x27;arrest&#x27;, &#x27;N&#x27;, &#x27;O&#x27;),
 (&#x27;is&#x27;, &#x27;V&#x27;, &#x27;O&#x27;),
 (&#x27;nog&#x27;, &#x27;Adv&#x27;, &#x27;O&#x27;),
 (&#x27;niet&#x27;, &#x27;Adv&#x27;, &#x27;O&#x27;),
 (&#x27;schriftelijk&#x27;, &#x27;Adj&#x27;, &#x27;O&#x27;),
 (&#x27;beschikbaar&#x27;, &#x27;Adj&#x27;, &#x27;O&#x27;),
 (&#x27;maar&#x27;, &#x27;Conj&#x27;, &#x27;O&#x27;),
 (&#x27;het&#x27;, &#x27;Art&#x27;, &#x27;O&#x27;),
 (&#x27;bericht&#x27;, &#x27;N&#x27;, &#x27;O&#x27;),
 (&#x27;werd&#x27;, &#x27;V&#x27;, &#x27;O&#x27;),
 (&#x27;alvast&#x27;, &#x27;Adv&#x27;, &#x27;O&#x27;),
 (&#x27;bekendgemaakt&#x27;, &#x27;V&#x27;, &#x27;O&#x27;),
 (&#x27;door&#x27;, &#x27;Prep&#x27;, &#x27;O&#x27;),
 (&#x27;een&#x27;, &#x27;Art&#x27;, &#x27;O&#x27;),
 (&#x27;communicatiebureau&#x27;, &#x27;N&#x27;, &#x27;O&#x27;),
 (&#x27;dat&#x27;, &#x27;Conj&#x27;, &#x27;O&#x27;),
 (&#x27;Floralux&#x27;, &#x27;N&#x27;, &#x27;B-ORG&#x27;),
 (&#x27;inhuurde&#x27;, &#x27;V&#x27;, &#x27;O&#x27;),
 (&#x27;.&#x27;, &#x27;Punc&#x27;, &#x27;O&#x27;)]

特征提取

虽然如今人们期望神经网络自己学习输入文本的相关特征,但这与条件随机场完全不同。CRF 学习我们给它们的特征与给定上下文中标记的标签之间的关系。它们不会自己学习这些特征。相反,模型的质量将高度依赖于我们向它展示的特征的相关性。

因此,本教程中最重要的方法是收集每个标记的特征的方法。哪些信息可能有用?当然,单词本身,连同它的词性标签。了解单词是否完全大写、是否以大写字母开头或是否为数字也可能很有趣。此外,我们还查看了单词结尾的字符二元语法和三元语法。我们还为每个标记提供了一个 bias 特征,它始终具有相同的值。此偏差特征有助于 CRF 学习每个标签类型在训练数据中的相对频率。

为了向 CRF 提供更多关于单词含义的信息,我们还引入了来自词嵌入的信息。在我们的 词嵌入笔记本 中,我们在荷兰语维基百科上训练了词嵌入,并将它们聚类到 500 个集群中。在这里,我们将从文件读取这 500 个集群,并将每个单词映射到它所属集群的 ID。这对命名实体识别非常有用,因为大多数实体类型都聚类在一起。这允许 CRF 在词级以上进行概括。例如,当 CRF 遇到一个从未见过的单词(例如,Albania)时,它可以根据单词所属的集群做出决定。如果此集群包含 CRF 在其训练数据中遇到的许多其他实体(例如,ItalyGermanyFrance),它将学习此集群与特定实体类型之间的字符串链接。因此,它仍然可以将该实体类型分配给未知单词。在我们的实验中,仅此一项特征就将性能提升了约 3%。

最后,除了标记本身之外,我们还希望 CRF 查看其上下文。更具体地说,我们将为它提供一些关于目标单词左侧和右侧两个单词的额外信息。我们将告诉 CRF 这些单词是什么、它们是否以大写字母开头或是否完全大写,并提供它们的词性标签。如果没有左或右上下文,我们将通知 CRF 该标记位于句子的开头或结尾(BOSEOS)。

def read_clusters(cluster_file):
    word2cluster = {}
    with open(cluster_file) as i:
        for line in i:
            word, cluster = line.strip().split('\t')
            word2cluster[word] = cluster
    return word2cluster


def word2features(sent, i, word2cluster):
    word = sent[i][0]
    postag = sent[i][1]
    features = [
        'bias',
        'word.lower=' + word.lower(),
        'word[-3:]=' + word[-3:],
        'word[-2:]=' + word[-2:],
        'word.isupper=%s' % word.isupper(),
        'word.istitle=%s' % word.istitle(),
        'word.isdigit=%s' % word.isdigit(),
        'word.cluster=%s' % word2cluster[word.lower()] if word.lower() in word2cluster else "0",
        'postag=' + postag
    ]
    if i &gt; 0:
        word1 = sent[i-1][0]
        postag1 = sent[i-1][1]
        features.extend([
            '-1:word.lower=' + word1.lower(),
            '-1:word.istitle=%s' % word1.istitle(),
            '-1:word.isupper=%s' % word1.isupper(),
            '-1:postag=' + postag1
        ])
    else:
        features.append('BOS')

    if i &gt; 1: 
        word2 = sent[i-2][0]
        postag2 = sent[i-2][1]
        features.extend([
            '-2:word.lower=' + word2.lower(),
            '-2:word.istitle=%s' % word2.istitle(),
            '-2:word.isupper=%s' % word2.isupper(),
            '-2:postag=' + postag2
        ])        

        
    if i &lt; len(sent)-1:
        word1 = sent[i+1][0]
        postag1 = sent[i+1][1]
        features.extend([
            '+1:word.lower=' + word1.lower(),
            '+1:word.istitle=%s' % word1.istitle(),
            '+1:word.isupper=%s' % word1.isupper(),
            '+1:postag=' + postag1
        ])
    else:
        features.append('EOS')

    if i &lt; len(sent)-2:
        word2 = sent[i+2][0]
        postag2 = sent[i+2][1]
        features.extend([
            '+2:word.lower=' + word2.lower(),
            '+2:word.istitle=%s' % word2.istitle(),
            '+2:word.isupper=%s' % word2.isupper(),
            '+2:postag=' + postag2
        ])

        
    return features


def sent2features(sent, word2cluster):
    return [word2features(sent, i, word2cluster) for i in range(len(sent))]

def sent2labels(sent):
    return [label for token, postag, label in sent]

def sent2tokens(sent):
    return [token for token, postag, label in sent]

word2cluster = read_clusters("data/embeddings/clusters_nl.tsv")
sent2features(train_sents[0], word2cluster)[0]
[&#x27;bias&#x27;,
 &#x27;word.lower=de&#x27;,
 &#x27;word[-3:]=De&#x27;,
 &#x27;word[-2:]=De&#x27;,
 &#x27;word.isupper=False&#x27;,
 &#x27;word.istitle=True&#x27;,
 &#x27;word.isdigit=False&#x27;,
 &#x27;word.cluster=38&#x27;,
 &#x27;postag=Art&#x27;,
 &#x27;BOS&#x27;,
 &#x27;+1:word.lower=tekst&#x27;,
 &#x27;+1:word.istitle=False&#x27;,
 &#x27;+1:word.isupper=False&#x27;,
 &#x27;+1:postag=N&#x27;,
 &#x27;+2:word.lower=van&#x27;,
 &#x27;+2:word.istitle=False&#x27;,
 &#x27;+2:word.isupper=False&#x27;,
 &#x27;+2:postag=Prep&#x27;]
X_train = [sent2features(s, word2cluster) for s in train_sents]
y_train = [sent2labels(s) for s in train_sents]

X_dev = [sent2features(s, word2cluster) for s in dev_sents]
y_dev = [sent2labels(s) for s in dev_sents]

X_test = [sent2features(s, word2cluster) for s in test_sents]
y_test = [sent2labels(s) for s in test_sents]

训练

我们现在创建一个 CRF 模型并对其进行训练。我们将使用标准的 L-BFGS 算法进行参数估计,并运行 100 次迭代。完成后,我们将使用 joblib 保存模型。

crf = crfsuite.CRF(
    verbose='true',
    algorithm='lbfgs',
    max_iterations=100
)

crf.fit(X_train, y_train, X_dev=X_dev, y_dev=y_dev)
loading training data to CRFsuite: 100%|██████████| 15806/15806 [00:02&amp;lt;00:00, 7623.17it/s]
loading dev data to CRFsuite:  27%|██▋       | 769/2895 [00:00&amp;lt;00:00, 7689.13it/s]
loading dev data to CRFsuite: 100%|██████████| 2895/2895 [00:00&amp;lt;00:00, 7186.08it/s]
Holdout group: 2

Feature generation
type: CRF1d
feature.minfreq: 0.000000
feature.possible_states: 0
feature.possible_transitions: 0
0....1....2....3....4....5....6....7....8....9....10
Number of features: 152117
Seconds required: 0.424

L-BFGS optimization
c1: 0.000000
c2: 1.000000
num_memories: 6
max_iterations: 100
epsilon: 0.000010
stop: 10
delta: 0.000010
linesearch: MoreThuente
linesearch.max_iterations: 20

Iter 1   time=0.37  loss=104214.83 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.00
Iter 2   time=0.21  loss=96997.81 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.13
Iter 3   time=0.21  loss=92085.38 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.26
Iter 4   time=0.21  loss=84277.67 active=152117 precision=0.100  recall=0.111  F1=0.105  Acc(item/seq)=0.901 0.496  feature_norm=1.51
Iter 5   time=0.21  loss=67577.53 active=152117 precision=0.169  recall=0.113  F1=0.109  Acc(item/seq)=0.902 0.496  feature_norm=2.32
Iter 6   time=0.21  loss=47854.26 active=152117 precision=0.326  recall=0.347  F1=0.320  Acc(item/seq)=0.930 0.580  feature_norm=4.34
Iter 7   time=0.21  loss=43326.19 active=152117 precision=0.340  recall=0.365  F1=0.333  Acc(item/seq)=0.933 0.592  feature_norm=5.06
Iter 8   time=0.21  loss=38617.07 active=152117 precision=0.372  recall=0.399  F1=0.362  Acc(item/seq)=0.938 0.618  feature_norm=6.43
Iter 9   time=0.21  loss=35511.85 active=152117 precision=0.491  recall=0.442  F1=0.421  Acc(item/seq)=0.942 0.631  feature_norm=8.62
Iter 10  time=0.21  loss=32735.31 active=152117 precision=0.535  recall=0.456  F1=0.445  Acc(item/seq)=0.944 0.654  feature_norm=9.50
Iter 11  time=0.21  loss=31687.17 active=152117 precision=0.530  recall=0.491  F1=0.502  Acc(item/seq)=0.947 0.663  feature_norm=10.94
Iter 12  time=0.21  loss=29323.19 active=152117 precision=0.532  recall=0.482  F1=0.477  Acc(item/seq)=0.946 0.666  feature_norm=11.18
Iter 13  time=0.21  loss=28733.55 active=152117 precision=0.596  recall=0.489  F1=0.489  Acc(item/seq)=0.947 0.668  feature_norm=11.58
Iter 14  time=0.21  loss=27120.69 active=152117 precision=0.640  recall=0.507  F1=0.512  Acc(item/seq)=0.948 0.677  feature_norm=12.36
Iter 15  time=0.21  loss=24849.05 active=152117 precision=0.640  recall=0.547  F1=0.558  Acc(item/seq)=0.952 0.697  feature_norm=13.86
Iter 16  time=0.40  loss=24033.40 active=152117 precision=0.654  recall=0.580  F1=0.586  Acc(item/seq)=0.954 0.706  feature_norm=14.53
Iter 17  time=0.21  loss=22935.94 active=152117 precision=0.669  recall=0.578  F1=0.598  Acc(item/seq)=0.955 0.712  feature_norm=15.15
Iter 18  time=0.21  loss=21803.53 active=152117 precision=0.682  recall=0.584  F1=0.605  Acc(item/seq)=0.956 0.713  feature_norm=15.67
Iter 19  time=0.21  loss=21046.75 active=152117 precision=0.725  recall=0.565  F1=0.586  Acc(item/seq)=0.956 0.717  feature_norm=16.04
Iter 20  time=0.21  loss=20465.96 active=152117 precision=0.701  recall=0.566  F1=0.594  Acc(item/seq)=0.956 0.711  feature_norm=15.81
Iter 21  time=0.21  loss=19991.29 active=152117 precision=0.706  recall=0.586  F1=0.611  Acc(item/seq)=0.957 0.717  feature_norm=15.63
Iter 22  time=0.21  loss=19560.23 active=152117 precision=0.684  recall=0.597  F1=0.616  Acc(item/seq)=0.957 0.722  feature_norm=15.58
Iter 23  time=0.21  loss=19241.14 active=152117 precision=0.672  recall=0.602  F1=0.615  Acc(item/seq)=0.957 0.726  feature_norm=15.65
Iter 24  time=0.21  loss=18787.87 active=152117 precision=0.678  recall=0.627  F1=0.637  Acc(item/seq)=0.958 0.731  feature_norm=16.31
Iter 25  time=0.21  loss=18145.13 active=152117 precision=0.690  recall=0.625  F1=0.640  Acc(item/seq)=0.959 0.734  feature_norm=16.94
Iter 26  time=0.21  loss=17786.38 active=152117 precision=0.710  recall=0.621  F1=0.642  Acc(item/seq)=0.959 0.738  feature_norm=17.48
Iter 27  time=0.21  loss=17247.02 active=152117 precision=0.711  recall=0.625  F1=0.649  Acc(item/seq)=0.959 0.736  feature_norm=18.46
Iter 28  time=0.21  loss=16876.01 active=152117 precision=0.737  recall=0.627  F1=0.655  Acc(item/seq)=0.960 0.748  feature_norm=19.83
Iter 29  time=0.21  loss=16543.87 active=152117 precision=0.732  recall=0.631  F1=0.663  Acc(item/seq)=0.961 0.751  feature_norm=20.12
Iter 30  time=0.21  loss=16263.21 active=152117 precision=0.725  recall=0.644  F1=0.671  Acc(item/seq)=0.962 0.753  feature_norm=20.42
Iter 31  time=0.21  loss=15665.78 active=152117 precision=0.715  recall=0.661  F1=0.676  Acc(item/seq)=0.963 0.758  feature_norm=21.50
Iter 32  time=0.21  loss=15247.34 active=152117 precision=0.700  recall=0.641  F1=0.650  Acc(item/seq)=0.961 0.748  feature_norm=23.18
Iter 33  time=0.21  loss=14866.51 active=152117 precision=0.702  recall=0.658  F1=0.670  Acc(item/seq)=0.963 0.754  feature_norm=24.67
Iter 34  time=0.21  loss=14650.96 active=152117 precision=0.704  recall=0.659  F1=0.675  Acc(item/seq)=0.963 0.755  feature_norm=25.38
Iter 35  time=0.21  loss=14386.98 active=152117 precision=0.730  recall=0.677  F1=0.695  Acc(item/seq)=0.964 0.761  feature_norm=26.47
Iter 36  time=0.21  loss=14158.49 active=152117 precision=0.742  recall=0.681  F1=0.704  Acc(item/seq)=0.965 0.763  feature_norm=28.58
Iter 37  time=0.21  loss=13895.30 active=152117 precision=0.736  recall=0.684  F1=0.701  Acc(item/seq)=0.965 0.765  feature_norm=28.72
Iter 38  time=0.21  loss=13656.45 active=152117 precision=0.730  recall=0.683  F1=0.695  Acc(item/seq)=0.965 0.768  feature_norm=29.07
Iter 39  time=0.21  loss=13499.28 active=152117 precision=0.727  recall=0.680  F1=0.691  Acc(item/seq)=0.965 0.769  feature_norm=29.61
Iter 40  time=0.21  loss=13174.95 active=152117 precision=0.726  recall=0.677  F1=0.689  Acc(item/seq)=0.965 0.766  feature_norm=31.03
Iter 41  time=0.21  loss=13104.00 active=152117 precision=0.736  recall=0.662  F1=0.678  Acc(item/seq)=0.964 0.760  feature_norm=33.06
Iter 42  time=0.21  loss=12750.79 active=152117 precision=0.731  recall=0.685  F1=0.701  Acc(item/seq)=0.966 0.764  feature_norm=34.35
Iter 43  time=0.21  loss=12637.02 active=152117 precision=0.740  recall=0.690  F1=0.708  Acc(item/seq)=0.966 0.766  feature_norm=34.63
Iter 44  time=0.21  loss=12534.20 active=152117 precision=0.745  recall=0.692  F1=0.712  Acc(item/seq)=0.966 0.766  feature_norm=35.30
Iter 45  time=0.21  loss=12390.89 active=152117 precision=0.739  recall=0.682  F1=0.702  Acc(item/seq)=0.966 0.761  feature_norm=37.15
Iter 46  time=0.21  loss=12277.42 active=152117 precision=0.733  recall=0.689  F1=0.704  Acc(item/seq)=0.966 0.763  feature_norm=37.58
Iter 47  time=0.21  loss=12219.03 active=152117 precision=0.739  recall=0.690  F1=0.706  Acc(item/seq)=0.966 0.767  feature_norm=37.20
Iter 48  time=0.21  loss=12125.64 active=152117 precision=0.743  recall=0.695  F1=0.711  Acc(item/seq)=0.967 0.770  feature_norm=36.99
Iter 49  time=0.21  loss=11970.65 active=152117 precision=0.745  recall=0.697  F1=0.712  Acc(item/seq)=0.967 0.775  feature_norm=36.84
Iter 50  time=0.21  loss=11780.73 active=152117 precision=0.754  recall=0.696  F1=0.718  Acc(item/seq)=0.968 0.777  feature_norm=37.57
Iter 51  time=0.21  loss=11623.83 active=152117 precision=0.740  recall=0.691  F1=0.710  Acc(item/seq)=0.967 0.774  feature_norm=38.21
Iter 52  time=0.21  loss=11549.38 active=152117 precision=0.740  recall=0.690  F1=0.709  Acc(item/seq)=0.967 0.773  feature_norm=38.94
Iter 53  time=0.21  loss=11497.85 active=152117 precision=0.739  recall=0.696  F1=0.713  Acc(item/seq)=0.967 0.772  feature_norm=39.54
Iter 54  time=0.21  loss=11419.64 active=152117 precision=0.733  recall=0.692  F1=0.707  Acc(item/seq)=0.966 0.773  feature_norm=40.40
Iter 55  time=0.21  loss=11280.29 active=152117 precision=0.744  recall=0.707  F1=0.719  Acc(item/seq)=0.967 0.773  feature_norm=41.75
Iter 56  time=0.21  loss=11131.39 active=152117 precision=0.746  recall=0.710  F1=0.722  Acc(item/seq)=0.968 0.773  feature_norm=42.72
Iter 57  time=0.21  loss=11043.40 active=152117 precision=0.751  recall=0.713  F1=0.726  Acc(item/seq)=0.968 0.774  feature_norm=42.92
Iter 58  time=0.21  loss=10954.38 active=152117 precision=0.769  recall=0.713  F1=0.736  Acc(item/seq)=0.969 0.781  feature_norm=43.18
Iter 59  time=0.21  loss=10836.31 active=152117 precision=0.773  recall=0.713  F1=0.736  Acc(item/seq)=0.969 0.781  feature_norm=43.74
Iter 60  time=0.21  loss=10712.24 active=152117 precision=0.779  recall=0.719  F1=0.744  Acc(item/seq)=0.970 0.788  feature_norm=44.41
Iter 61  time=0.21  loss=10602.81 active=152117 precision=0.789  recall=0.709  F1=0.740  Acc(item/seq)=0.970 0.789  feature_norm=44.68
Iter 62  time=0.21  loss=10508.84 active=152117 precision=0.782  recall=0.711  F1=0.739  Acc(item/seq)=0.970 0.787  feature_norm=45.37
Iter 63  time=0.21  loss=10458.88 active=152117 precision=0.783  recall=0.717  F1=0.744  Acc(item/seq)=0.970 0.788  feature_norm=45.51
Iter 64  time=0.21  loss=10420.78 active=152117 precision=0.763  recall=0.711  F1=0.730  Acc(item/seq)=0.969 0.786  feature_norm=45.67
Iter 65  time=0.21  loss=10315.28 active=152117 precision=0.766  recall=0.721  F1=0.735  Acc(item/seq)=0.969 0.786  feature_norm=46.30
Iter 66  time=0.21  loss=10204.10 active=152117 precision=0.769  recall=0.728  F1=0.740  Acc(item/seq)=0.970 0.786  feature_norm=47.10
Iter 67  time=0.21  loss=10134.54 active=152117 precision=0.769  recall=0.716  F1=0.737  Acc(item/seq)=0.970 0.787  feature_norm=47.87
Iter 68  time=0.21  loss=10095.10 active=152117 precision=0.773  recall=0.718  F1=0.741  Acc(item/seq)=0.970 0.787  feature_norm=47.85
Iter 69  time=0.21  loss=10059.52 active=152117 precision=0.773  recall=0.715  F1=0.738  Acc(item/seq)=0.970 0.790  feature_norm=47.81
Iter 70  time=0.21  loss=10012.53 active=152117 precision=0.767  recall=0.712  F1=0.733  Acc(item/seq)=0.970 0.790  feature_norm=47.95
Iter 71  time=0.21  loss=9931.41  active=152117 precision=0.765  recall=0.710  F1=0.731  Acc(item/seq)=0.970 0.791  feature_norm=48.49
Iter 72  time=0.21  loss=9861.45  active=152117 precision=0.763  recall=0.709  F1=0.730  Acc(item/seq)=0.970 0.793  feature_norm=49.07
Iter 73  time=0.21  loss=9803.75  active=152117 precision=0.769  recall=0.723  F1=0.741  Acc(item/seq)=0.970 0.796  feature_norm=49.32
Iter 74  time=0.21  loss=9762.61  active=152117 precision=0.758  recall=0.720  F1=0.733  Acc(item/seq)=0.970 0.793  feature_norm=49.58
Iter 75  time=0.21  loss=9726.47  active=152117 precision=0.761  recall=0.722  F1=0.736  Acc(item/seq)=0.970 0.790  feature_norm=49.80
Iter 76  time=0.21  loss=9628.97  active=152117 precision=0.764  recall=0.722  F1=0.736  Acc(item/seq)=0.970 0.789  feature_norm=50.47
Iter 77  time=0.40  loss=9586.61  active=152117 precision=0.763  recall=0.725  F1=0.740  Acc(item/seq)=0.971 0.791  feature_norm=50.96
Iter 78  time=0.21  loss=9522.00  active=152117 precision=0.767  recall=0.723  F1=0.741  Acc(item/seq)=0.970 0.788  feature_norm=51.34
Iter 79  time=0.21  loss=9479.87  active=152117 precision=0.765  recall=0.720  F1=0.736  Acc(item/seq)=0.970 0.788  feature_norm=51.77
Iter 80  time=0.21  loss=9448.33  active=152117 precision=0.771  recall=0.720  F1=0.739  Acc(item/seq)=0.970 0.789  feature_norm=51.80
Iter 81  time=0.21  loss=9423.55  active=152117 precision=0.769  recall=0.723  F1=0.740  Acc(item/seq)=0.970 0.789  feature_norm=51.83
Iter 82  time=0.21  loss=9367.51  active=152117 precision=0.763  recall=0.720  F1=0.736  Acc(item/seq)=0.970 0.790  feature_norm=52.25
Iter 83  time=0.21  loss=9322.98  active=152117 precision=0.762  recall=0.722  F1=0.737  Acc(item/seq)=0.970 0.793  feature_norm=52.34
Iter 84  time=0.21  loss=9277.51  active=152117 precision=0.765  recall=0.725  F1=0.740  Acc(item/seq)=0.971 0.798  feature_norm=52.47
Iter 85  time=0.21  loss=9229.07  active=152117 precision=0.772  recall=0.726  F1=0.743  Acc(item/seq)=0.971 0.798  feature_norm=52.79
Iter 86  time=0.21  loss=9214.50  active=152117 precision=0.782  recall=0.729  F1=0.751  Acc(item/seq)=0.971 0.799  feature_norm=52.88
Iter 87  time=0.21  loss=9194.55  active=152117 precision=0.785  recall=0.731  F1=0.753  Acc(item/seq)=0.971 0.800  feature_norm=52.82
Iter 88  time=0.21  loss=9182.23  active=152117 precision=0.783  recall=0.728  F1=0.750  Acc(item/seq)=0.971 0.798  feature_norm=52.86
Iter 89  time=0.21  loss=9161.54  active=152117 precision=0.786  recall=0.726  F1=0.750  Acc(item/seq)=0.971 0.795  feature_norm=52.98
Iter 90  time=0.21  loss=9130.06  active=152117 precision=0.788  recall=0.728  F1=0.752  Acc(item/seq)=0.971 0.797  feature_norm=53.17
Iter 91  time=0.21  loss=9063.09  active=152117 precision=0.794  recall=0.738  F1=0.760  Acc(item/seq)=0.972 0.800  feature_norm=53.64
Iter 92  time=0.40  loss=9041.98  active=152117 precision=0.791  recall=0.739  F1=0.759  Acc(item/seq)=0.972 0.801  feature_norm=53.77
Iter 93  time=0.21  loss=9010.27  active=152117 precision=0.783  recall=0.736  F1=0.754  Acc(item/seq)=0.972 0.801  feature_norm=53.97
Iter 94  time=0.21  loss=8984.80  active=152117 precision=0.786  recall=0.740  F1=0.759  Acc(item/seq)=0.972 0.803  feature_norm=54.09
Iter 95  time=0.21  loss=8965.14  active=152117 precision=0.788  recall=0.740  F1=0.760  Acc(item/seq)=0.972 0.803  feature_norm=54.14
Iter 96  time=0.21  loss=8931.00  active=152117 precision=0.786  recall=0.737  F1=0.758  Acc(item/seq)=0.972 0.804  feature_norm=54.26
Iter 97  time=0.21  loss=8923.90  active=152117 precision=0.791  recall=0.737  F1=0.757  Acc(item/seq)=0.971 0.799  feature_norm=54.68
Iter 98  time=0.21  loss=8860.78  active=152117 precision=0.784  recall=0.735  F1=0.755  Acc(item/seq)=0.971 0.802  feature_norm=54.64
Iter 99  time=0.21  loss=8846.13  active=152117 precision=0.788  recall=0.737  F1=0.758  Acc(item/seq)=0.972 0.803  feature_norm=54.68
Iter 100 time=0.21  loss=8832.34  active=152117 precision=0.789  recall=0.737  F1=0.758  Acc(item/seq)=0.972 0.803  feature_norm=54.77
================================================
Label      Precision    Recall     F1    Support
-------  -----------  --------  -----  ---------
B-LOC          0.823     0.823  0.823        479
B-MISC         0.806     0.651  0.720        748
B-ORG          0.853     0.620  0.718        686
B-PER          0.752     0.855  0.800        703
I-LOC          0.583     0.547  0.565         64
I-MISC         0.645     0.507  0.568        215
I-ORG          0.806     0.684  0.740        396
I-PER          0.844     0.946  0.892        423
O              0.990     0.998  0.994      33973
------------------------------------------------
L-BFGS terminated with the maximum number of iterations
Total seconds required for training: 21.768

Storing the model
Number of active features: 152117 (152117)
Number of active attributes: 130306 (145602)
Number of active labels: 9 (9)
Writing labels
Writing attributes
Writing feature references for transitions
Writing feature references for attributes
Seconds required: 0.058

CRF(algorithm=&#x27;lbfgs&#x27;, all_possible_states=None,
  all_possible_transitions=None, averaging=None, c=None, c1=None, c2=None,
  calibration_candidates=None, calibration_eta=None,
  calibration_max_trials=None, calibration_rate=None,
  calibration_samples=None, delta=None, epsilon=None, error_sensitive=None,
  gamma=None, keep_tempfiles=None, linesearch=None, max_iterations=100,
  max_linesearch=None, min_freq=None, model_filename=None,
  num_memories=None, pa_type=None, period=None, trainer_cls=None,
  variance=None, verbose=&#x27;true&#x27;)
import joblib
import os

OUTPUT_PATH = "models/ner/"
OUTPUT_FILE = "crf_model"

if not os.path.exists(OUTPUT_PATH):
    os.mkdir(OUTPUT_PATH)

joblib.dump(crf, os.path.join(OUTPUT_PATH, OUTPUT_FILE))
[&#x27;models/ner/crf_model&#x27;]

评估

让我们评估 CRF 的输出。我们将从上面的输出文件加载模型,并让它预测完整测试集的标签。

作为健全性检查,让我们看一下它对第一个测试句子的预测。此输出看起来非常好:CRF 能够正确地预测句子中的所有四个位置。它只错过了人物实体,这本身就是一个奇怪的案例,因为它实际上不是人名。

crf = joblib.load(os.path.join(OUTPUT_PATH, OUTPUT_FILE))
y_pred = crf.predict(X_test)

example_sent = test_sents[0]

print("Sentence:", ' '.join(sent2tokens(example_sent)))
print("Predicted:", ' '.join(crf.predict([sent2features(example_sent, word2cluster)])[0]))
print("Correct:  ", ' '.join(sent2labels(example_sent)))
Sentence: Dat is in Italië , Spanje of Engeland misschien geen probleem , maar volgens &#x27; Der Kaiser &#x27; in Duitsland wel .
Predicted: O O O B-LOC O B-LOC O B-LOC O O O O O O O B-MISC I-MISC O O B-LOC O O
Correct:   O O O B-LOC O B-LOC O B-LOC O O O O O O O B-PER I-PER O O B-LOC O O

现在,我们在完整的测试集上进行评估。我们将打印所有标签(除了 O)的分类报告。如果我们包含 O,它在我们的数据中远远超过实体标签,那么平均分数将被人工夸大,仅仅因为 CRF 的 O 标签本就具有很高的正确概率。我们在所有实体类型中获得了 77% 的平均 F 分数(微平均),其中 B-LOCB-PER 的结果特别好。

labels = list(crf.classes_)
labels.remove("O")
y_pred = crf.predict(X_test)
sorted_labels = sorted(
    labels,
    key=lambda name: (name[1:], name[0])
)

print(metrics.flat_classification_report(y_test, y_pred, labels=sorted_labels))
              precision    recall  f1-score   support

       B-LOC       0.83      0.83      0.83       774
       I-LOC       0.29      0.41      0.34        49
      B-MISC       0.84      0.61      0.71      1187
      I-MISC       0.59      0.42      0.49       410
       B-ORG       0.80      0.69      0.74       882
       I-ORG       0.74      0.66      0.70       551
       B-PER       0.80      0.90      0.85      1098
       I-PER       0.87      0.95      0.91       807

   micro avg       0.80      0.74      0.77      5758
   macro avg       0.72      0.68      0.70      5758
weighted avg       0.80      0.74      0.76      5758

现在,我们还可以查看 CRF 识别出的最可能的转换,以及每个标签的顶部特征。我们将使用 eli5 库来完成此操作,它可以帮助我们解释机器学习模型的预测。

顶部转换非常直观:最可能的转换是在同一实体类型内(从 B 标签到 O 标签),以及 B 标签位于 O 标签之后的那些转换。

特征也有道理。例如,如果单词不以大写字母开头,它不太可能是实体。相反,如果单词以 结尾,它很可能是一个位置,这确实是荷兰语中位置的常见后缀。还要注意嵌入集群的信息量有多大:对于所有实体类型,单词集群构成 CRF 的一些最具信息量的特征。

import eli5

eli5.show_weights(crf, top=30)
从 \ 到 O B-LOC I-LOC B-MISC I-MISC B-ORG I-ORG B-PER I-PER
O 4.141 4.583 0.0 4.141 0.0 4.366 0.0 3.819 0.0
B-LOC -0.248 -0.279 7.101 0.0 0.0 0.0 0.0 -0.661 0.0
I-LOC -1.062 -0.235 5.967 0.0 0.0 0.0 0.0 0.0 0.0
B-MISC -0.985 0.655 0.0 -0.316 7.73 0.551 0.0 0.46 0.0
I-MISC -1.781 0.0 0.0 -0.382 7.769 1.145 0.0 -0.719 0.0
B-ORG -0.261 0.0 0.0 -0.809 0.0 0.0 7.803 0.106 0.0
I-ORG -0.794 0.0 0.0 0.0 0.0 0.0 7.174 0.084 0.0
B-PER 0.31 -0.346 0.0 -0.611 0.0 0.0 0.0 -1.408 8.68
I-PER 0.104 0.0 0.0 0.0 0.0 0.0 0.0 0.0 6.804
y=O 顶部特征 y=B-LOC 顶部特征 y=I-LOC 顶部特征 y=B-MISC 顶部特征 y=I-MISC 顶部特征 y=B-ORG 顶部特征 y=I-ORG 顶部特征 y=B-PER 顶部特征 y=I-PER 顶部特征
权重? 特征
+3.692 word.istitle=False
+3.312 word.isupper=False
+2.211 -1:word.lower="
+1.999 -1:word.lower=+
+1.835 EOS
+1.820 BOS
+1.740 word.cluster=158
+1.684 word.cluster=415
+1.590 word[-2:]=ag
+1.588 word.cluster=195
+1.556 word[-3:]=dag
+1.520 word.cluster=185
+1.520 word.cluster=178
+1.481 word.cluster=24
+1.477 word.cluster=177
+1.475 word.lower=u
… 108173 个更多正值 …
… 6671 个更多负值 …
-1.515 word.cluster=375
-1.538 -1:word.lower=ronde
-1.561 +1:word.lower=(
-1.618 0
-1.693 -1:word.lower=de
-1.740 postag=N
-1.808 word[-2:]='s
-1.843 word[-3:]=ple
-1.853 postag=Misc
-1.914 word.cluster=6
-1.945 +2:word.lower=1
-1.990 word.cluster=111
-2.068 word.isupper=True
-2.447 word.istitle=True
权重? 特征
+3.734 word.cluster=325
+3.650 word.cluster=375
+3.466 word.cluster=68
+3.169 word.cluster=139
+2.020 -1:word.lower=in
+1.995 word.cluster=143
+1.973 word.cluster=476
+1.828 word.cluster=102
+1.617 word[-2:]=ië
+1.538 -1:word.lower=(
+1.516 -1:word.lower=uit
+1.495 word.cluster=154
+1.441 +1:word.lower=/
+1.417 +2:word.lower=-
+1.404 +2:word.lower=tel.
+1.379 -1:word.lower=het
+1.374 word.cluster=440
+1.369 word.lower=sint-michiels
+1.337 word[-3:]=urs
+1.320 word.lower=futuroscope
+1.301 word[-3:]=ope
+1.301 word.cluster=363
+1.278 word.lower=vst
+1.278 word[-3:]=VSt
+1.257 word[-3:]=den
+1.245 word.cluster=146
+1.244 word.cluster=116
+1.212 word[-2:]=St
+1.198 word.cluster=492
… 4441 个更多正值 …
… 668 个更多负值 …
-1.322 -2:word.lower=ronde
权重? 特征
+1.778 word.cluster=238
+1.488 +2:word.lower=m
+1.367 word.cluster=161
+0.996 -1:word.lower=col
+0.977 word[-2:]=rk
+0.852 -1:word.istitle=False
+0.821 word[-2:]=al
+0.810 word.lower=york
+0.810 word[-3:]=ork
+0.809 word[-3:]=eum
+0.781 -1:word.lower=san
+0.703 word.lower=abeba
+0.703 -1:word.lower=addis
+0.702 word[-3:]=eba
+0.701 -1:word.isupper=True
+0.683 -2:word.lower='
+0.682 +2:word.lower=,
+0.677 word[-3:]=den
+0.676 word[-3:]=aan
+0.674 word[-2:]=um
+0.670 word.lower=staten
+0.669 word.cluster=38
+0.661 postag=Prep
+0.650 -2:word.lower=col
+0.650 -1:postag=Art
+0.648 -1:word.lower=museum
+0.645 word[-2:]=ba
+0.643 word.lower=hotel
… 1294 个更多正值 …
… 131 个更多负值 …
-0.674 EOS
-0.781 postag=Adj
权重? 特征
+3.318 word.cluster=23
+2.494 word.cluster=100
+2.419 word.cluster=39
+2.097 +2:word.lower=1
+2.039 word.cluster=294
+2.036 word.cluster=338
+1.786 word[-2:]='s
+1.772 word.cluster=11
+1.700 word.lower=sport
+1.646 word.lower=buitenland
+1.628 word[-2:]=se
+1.618 word.cluster=222
+1.535 word.cluster=427
+1.505 word.lower=journaal
+1.478 word.cluster=40
+1.417 word[-3:]=nse
+1.411 postag=Adj
+1.374 word.cluster=111
+1.350 word.lower=aula
+1.347 word.lower=tobin-taks
+1.329 word.cluster=218
+1.301 word.cluster=128
+1.274 word[-3:]=ula
+1.223 BOS
+1.213 word.lower=tobin-heffing
+1.204 word.lower=ronde
+1.204 word[-2:]=ks
+1.197 word.lower=v-plan
+1.184 word[-2:]=ex
… 5588 个更积极的 …
… 853 个更消极的 …
-1.256 -1:word.isupper=True
权重? 特征
+1.792 -2:word.lower=ronde
+1.684 -1:word.isupper=True
+1.567 -1:word.lower=ronde
+1.332 word.cluster=37
+1.323 +1:word.lower=ned
+1.316 word.cluster=325
+1.298 word.cluster=1
+1.274 word.lower=leven
+1.215 -1:word.istitle=True
+1.201 -1:postag=Num
+1.182 -1:word.lower=euro
+1.161 word.cluster=86
+1.147 word.lower=bijsluiter
+1.103 -1:word.lower=financiële
+1.100 +2:word.lower=ned
+1.096 word[-2:]=ap
+1.068 postag=Num
+1.042 word.cluster=413
+1.029 word[-2:]=00
+1.012 -1:word.lower=de
+1.010 -1:postag=Prep
+0.979 word.cluster=279
+0.954 -1:word.lower=travel
+0.950 -1:word.lower=formule
+0.947 word.lower=financiële
+0.945 -1:word.lower=brussel
+0.916 +1:word.lower='
… 3172 个更积极的 …
… 465 个更消极的 …
-1.115 postag=V
-1.467 +1:postag=Num
-2.953 -1:postag=Adj
权重? 特征
+2.798 word.cluster=424
+2.635 word.cluster=228
+2.121 word[-3:]=com
+1.991 word.cluster=187
+1.974 word.lower=quizpeople
+1.922 word[-3:]=ple
+1.848 +1:word.lower=morgen
+1.798 word.cluster=250
+1.683 word.cluster=83
+1.560 word.cluster=29
+1.538 word[-2:]=ga
+1.525 BOS
+1.500 word[-3:]=bel
+1.476 -2:word.lower=minister
+1.422 word.cluster=207
+1.418 word.cluster=249
+1.412 word.lower=waterleau
+1.411 word.isupper=True
+1.384 word.lower=belga
+1.384 word[-3:]=lga
+1.328 word[-3:]=our
+1.320 word[-2:]=om
+1.319 word[-2:]=to
+1.299 word.lower=freenet
+1.277 word.lower=baan
+1.240 -1:word.lower=bij
+1.227 word[-3:]=lux
+1.227 word[-2:]=co
+1.222 word.cluster=21
… 3591 个更积极的 …
… 469 个更消极的 …
-1.156 word.isupper=False
权重? 特征
+1.338 word.lower=morgen
+1.304 word.cluster=403
+1.243 word.cluster=413
+1.200 -1:word.lower=vlaams
+1.141 word.cluster=187
+1.120 word[-3:]=gen
+1.101 word[-3:]=ion
+1.057 -1:word.lower=radio
+1.028 word.cluster=321
+0.970 -1:word.lower=ned
+0.849 word[-2:]=nt
+0.841 word.cluster=143
+0.805 -2:word.lower=voor
+0.767 word.cluster=375
+0.767 -1:postag=Misc
+0.763 word[-2:]=es
+0.760 word[-2:]=3
+0.760 word.lower=3
+0.760 word[-3:]=3
+0.745 word.cluster=478
+0.733 word.cluster=411
+0.719 word[-2:]=ey
+0.705 word[-3:]=ola
+0.702 word[-2:]=le
… 2209 个更积极的 …
… 250 个更消极的 …
-0.777 -2:postag=V
-0.793 word[-2:]=in
-0.824 -1:word.lower=financiële
-0.842 -1:postag=Num
-1.090 0
-1.162 postag=V
权重? 特征
+3.523 word.cluster=489
+2.888 word.cluster=204
+2.818 word.cluster=301
+2.804 word.cluster=3
+2.765 word.cluster=246
+2.444 word.cluster=337
+2.419 word.cluster=6
+2.361 word.cluster=326
+2.199 word.cluster=296
+2.069 word.cluster=87
+1.975 word.cluster=349
+1.727 word.cluster=12
+1.575 word.cluster=105
+1.569 word.cluster=350
+1.448 word.lower=bode
+1.439 word.cluster=85
+1.347 word[-2:]=ov
+1.340 +1:word.lower=(
+1.333 -1:word.lower=-
+1.320 word.cluster=111
+1.307 +2:word.lower=--
+1.301 word[-2:]=ff
+1.300 -1:postag=V
+1.248 word[-2:]=án
+1.247 -1:word.lower=volgens
+1.231 word[-3:]=par
… 7464 个更积极的 …
… 851 个更消极的 …
-1.349 word[-3:]=ing
-1.360 word[-2:]=ur
-1.610 -1:word.lower=in
-1.937 -1:postag=Art
权重? 特征
+1.748 -1:word.lower=van
+1.425 word.cluster=3
+1.313 word.cluster=388
+1.287 word.cluster=450
+1.250 word.cluster=6
+1.231 word.cluster=249
+1.152 +1:word.lower=(
+1.127 +2:word.lower=die
+1.044 word.lower=gucht
+0.970 word.cluster=337
+0.927 -1:word.lower=pu
+0.927 word[-3:]=nfu
+0.927 word.lower=shenfu
+0.925 word[-2:]=fu
+0.921 word.cluster=296
+0.895 word.lower=grauwe
+0.875 word[-3:]=uwe
+0.869 word[-2:]=we
+0.867 -1:word.lower=de
+0.841 word[-2:]=ck
+0.815 -1:postag=Pron
+0.814 word.lower=den
… 4971 个更积极的 …
… 548 个更消极的 …
-0.814 word[-2:]=ma
-0.824 word[-3:]=aan
-0.851 word[-2:]=al
-0.961 word.cluster=238
-1.032 word.istitle=False
-1.055 postag=V
-1.071 word[-3:]=ter
-1.079 word[-2:]=um

寻找最佳超参数

到目前为止,我们已经使用默认参数训练了一个模型。这些参数不太可能让我们获得最佳性能。因此,我们将通过迭代训练不同的模型并对其进行评估来自动搜索最佳超参数设置。最终,我们将选择最佳模型。

这里我们将重点关注两个参数:c1c2。它们分别是 L1 和 L2 正则化的参数。正则化通过在损失函数中添加惩罚来防止对训练数据的过拟合。在 L1 正则化中,这种惩罚是权重绝对值的总和;在 L2 正则化中,它是权重平方值的总和。L1 正则化执行一种特征选择,因为它将 0 权重分配给不相关的特征。相比之下,L2 正则化使不相关特征的权重变小,但不一定为零。L1 正则化通常称为 Lasso 方法,L2 称为 Ridge 方法,而两者的线性组合称为弹性网络正则化。

我们定义了 c1 和 c2 的参数空间,并使用 flat F1 分数来比较各个模型。我们将依靠三折交叉验证来对 50 个候选者中的每一个进行评分。我们使用随机搜索,这意味着我们不会尝试所有指定的参数设置,而是让该过程从我们在参数空间中指定的分布中随机采样。它将进行 50 次 (n_iter)。这个过程需要一段时间,但值得等待。

import scipy
from sklearn.metrics import make_scorer
from sklearn.model_selection import RandomizedSearchCV

crf = crfsuite.CRF(
    algorithm='lbfgs',
    max_iterations=100,
    all_possible_transitions=True
)

params_space = {
    'c1': scipy.stats.expon(scale=0.5),
    'c2': scipy.stats.expon(scale=0.05),
}

f1_scorer = make_scorer(metrics.flat_f1_score,
                        average='weighted', labels=labels)

rs = RandomizedSearchCV(crf, params_space,
                        cv=3,
                        verbose=1,
                        n_jobs=-1,
                        n_iter=50,
                        scoring=f1_scorer)
rs.fit(X_train, y_train)
Fitting 3 folds for each of 50 candidates, totalling 150 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  18 tasks      | elapsed:  3.2min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 24.3min finished
RandomizedSearchCV(cv=3, error_score=&#x27;raise-deprecating&#x27;,
          estimator=CRF(algorithm=&#x27;lbfgs&#x27;, all_possible_states=None,
  all_possible_transitions=True, averaging=None, c=None, c1=None, c2=None,
  calibration_candidates=None, calibration_eta=None,
  calibration_max_trials=None, calibration_rate=None,
  calibration_samples=None, delta=None, epsilon=None, error...e,
  num_memories=None, pa_type=None, period=None, trainer_cls=None,
  variance=None, verbose=False),
          fit_params=None, iid=&#x27;warn&#x27;, n_iter=50, n_jobs=-1,
          param_distributions={&#x27;c1&#x27;: &lt;scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9947f04e10&gt;, &#x27;c2&#x27;: &lt;scipy.stats._distn_infrastructure.rv_frozen object at 0x7f9947f04c88&gt;},
          pre_dispatch=&#x27;2*n_jobs&#x27;, random_state=None, refit=True,
          return_train_score=&#x27;warn&#x27;,
          scoring=make_scorer(flat_f1_score, average=weighted, labels=[&#x27;B-ORG&#x27;, &#x27;B-MISC&#x27;, &#x27;B-PER&#x27;, &#x27;I-PER&#x27;, &#x27;B-LOC&#x27;, &#x27;I-MISC&#x27;, &#x27;I-ORG&#x27;, &#x27;I-LOC&#x27;]),
          verbose=1)

让我们看一下最佳超参数设置。我们的随机搜索建议使用 L1 和 L2 规范化的组合。

print('best params:', rs.best_params_)
print('best CV score:', rs.best_score_)
print('model size: {:0.2f}M'.format(rs.best_estimator_.size_ / 1000000))
best params: {&#x27;c1&#x27;: 0.08869645933566639, &#x27;c2&#x27;: 0.005642379370340676}
best CV score: 0.7608794798691931
model size: 1.06M

为了找出这将转化为多少精度、召回率和 F1 分数,我们从随机搜索中获取最佳估计器,并将其在测试集上进行评估。这确实显示出与我们初始模型相比有了不错的改进。我们的平均 F1 分数从 77% 提高到了 79.1%。精度和召回率都得到了提高,我们看到了所有四种实体类型的积极结果。

best_crf = rs.best_estimator_
y_pred = best_crf.predict(X_test)
print(metrics.flat_classification_report(
    y_test, y_pred, labels=sorted_labels, digits=3
))
              precision    recall  f1-score   support

       B-LOC      0.849     0.863     0.856       774
       I-LOC      0.359     0.571     0.441        49
      B-MISC      0.847     0.622     0.717      1187
      I-MISC      0.664     0.415     0.511       410
       B-ORG      0.806     0.727     0.764       882
       I-ORG      0.772     0.677     0.721       551
       B-PER      0.834     0.903     0.867      1098
       I-PER      0.892     0.958     0.924       807

   micro avg      0.823     0.761     0.791      5758
   macro avg      0.753     0.717     0.725      5758
weighted avg      0.820     0.761     0.784      5758

结论

自神经网络模型出现以来,条件随机场已经失去了部分人气。尽管如此,它们对于命名实体识别仍然非常有效,特别是在考虑词嵌入信息时。