The first paper is on evaluating pre-trained encoders on the task of low-resource NER across several English and German datasets, the second one analyzes relation classification evaluation and suggests that using F1 weightings other than micro-F1 tells us much more about model performance, e.g. on imbalanced datasets.