Skip to main content

Questions tagged [natural-language]

Natural Language Processing is a set of techniques from linguistics, artificial intelligence, machine learning and statistics that aim at processing and understanding human languages.

Filter by
Sorted by
Tagged with
1 vote
0 answers
63 views

In the original InstructGPT paper, the loss of the reward model is as follows: Why do the authors divide by ${K}\choose{2}$? If, for example, we have $7$ prompts and $5$ completions per prompt, the ...
user1446642's user avatar
2 votes
2 answers
103 views

I'm creating a presentation for some secondary math teachers. I want them to see how AI's ability to write code opens up a lot more data and analysis opportunities for them. For my example, I'm using ...
Sciolism Apparently's user avatar
0 votes
0 answers
75 views

I was going through Naive Bayes Classifier (from Cornell Machine Learning course (link here) and I found quite confusing the use of the Naive Bayes classifier for bag-of-words with the Multinomial ...
Reda A.'s user avatar
3 votes
1 answer
210 views

I am studying from here https://medium.com/@zafaralibagh6/a-simple-word2vec-tutorial-61e64e38a6a1 The author talks about 2 sets of weights, in the first hidden layer you have $W^1$ matrix and in the ...
Baron Yugovich's user avatar
2 votes
0 answers
42 views

Situation: I want to compare the performance of two models on the same task. I have a dataset of around 400 manually curated samples. The task is relatively niche (targeted sentiment analysis on ...
Kelly's user avatar
  • 21
1 vote
1 answer
130 views

I'm new to machine learning and currently working on new topic discovery and topic modelling under nlp. If I have unlabeled survey responses that I want to categorise but don't know how, run an NMF ...
viktor nikiforov's user avatar
1 vote
0 answers
42 views

I'm attempting to train a model to parse maritime location ranges. These are strings that can be resolved into a geographical area or a list of shipping ports. An example could be ...
Stromgren's user avatar
  • 119
0 votes
0 answers
18 views

(EDIT: Note my question is not about 'accuracy'/F1 as a measure of precision, but rather why we cant get the correct my test prediction script to work and how to merge the LORA back into the ROBERTA ...
Llewellyn van Zyl's user avatar
0 votes
0 answers
55 views

I am doing a project on spell correction. While evaluating the model results, I came across this situation: the input sentence has no errors, and the model outputs the input sentence as it is, which ...
Tharusha Bandaranayake's user avatar
1 vote
0 answers
17 views

I’m fine-tuning RoBERTa to classify text into 199 categories (e.g., “acculturation stress,” “cognitive flexibility,” etc.). My dataset has ~15,000 lines of text, each mapped to one of these well-being ...
Llewellyn van Zyl's user avatar
2 votes
2 answers
164 views

Typically when training for NLP tasks, we need to pad our sequences to a max_len, so they can be processed efficiently in a batch-wise manner. However, these padded ...
Antonios Sarikas's user avatar
3 votes
1 answer
106 views

Texts introducing ngram models often directly manipulate conditional probabilities. For example, given a corpus $V$ with a bigram model on its words, we would compute the probability of a sentence $...
olives's user avatar
  • 93
1 vote
0 answers
37 views

I'm using both SpaCy and Stanza to identify named entities in very short string (brand names and business names): ...
LearningScholar's user avatar
0 votes
0 answers
94 views

Fast regression on SQL queries. Good ML model architecture. Our goal is to predict which SQL engine (there are 2 currently) will be faster to execute a given query. The input is the query text and in ...
Ark-kun's user avatar
  • 141
2 votes
0 answers
40 views

I posted this on the Data Science Stack Exchange and didn’t get any responses (that sight seems pretty dead). So I’m trying here! I'm working on a project where I have to categorise short texts. I don'...
James's user avatar
  • 45
1 vote
0 answers
89 views

For context, I've been using feature hashing for a rapid text classifier with a very small number of features (2000, it is very small on purpose). I noticed that some of the results were a bit wonky ...
Felix Labelle's user avatar
1 vote
1 answer
110 views

I am building a "field tagger" for documents. Basically, a document, in my case something like a proposal or sales quote, would have a bunch of entities scattered throughout it, and we want ...
redbull_nowings's user avatar
4 votes
2 answers
746 views

I am trying to train a Random Forest model in R for sentiment analysis. The model works with tf-idf matrix and learns from it how to classify a review, in positive or negative. Positive ones are ...
Anisa B.'s user avatar
  • 143
0 votes
1 answer
60 views

I have a set of clinical notes with dates for each patient and an NLP models which gives a score between 0.0 and 1.0 of a certain event being present in the note. Given the scores, what is the best ...
rhn89's user avatar
  • 101
0 votes
1 answer
88 views

I have a list of hotel names which may or may not be correct, and with different spellings (such as '&' instead of 'and'). I want to use clustering in order to group the hotels with different ...
user480840's user avatar
1 vote
0 answers
89 views

I want to fine-tune BERT for Named Entity Recognition (NER). However, when fine-tuning over several epochs on different datasets I get a weird behaviour where the training loss decreases, eval loss ...
CodingSquirrel's user avatar
2 votes
1 answer
235 views

Say I'm using Top2Vec as a topic model to capture the top 10 salient topics across documents. I have an array that contains the documents of the corpus. Initially, there are not enough documents to ...
NominalSystems's user avatar
0 votes
0 answers
411 views

I have a doubt regarding using "Bert" as a generative model. I know Bert can be used for classification or fine-tuning the question-answering. However, is it possible to use Bert to generate ...
Encipher's user avatar
  • 185
1 vote
0 answers
51 views

The model's sample predictions that I'm printing during training are almost perfect but the model generates meaningless tokens during evaluation. For training I'm feeding it the source and target ...
Sean's user avatar
  • 4,347
0 votes
0 answers
389 views

My goal is to create a regression model with text data where encoded text predicts a value, (news headlines, or article summaries, predicting number of clicks). The y is very left-skewed (few articles ...
user3722736's user avatar
0 votes
0 answers
109 views

I am currently following this post, which details how BERT was trained. I had a few questions about the classification task: In the post, it mentions that the authors of BERT decided to add ...
Victor M's user avatar
  • 339
2 votes
0 answers
102 views

In a language like Ancient Greek, verbal forms are marked for voice (active/middle/passive). Deponent verbs are verbs that exist only in the middle (or passive) voice, but appear to have an active ...
user avatar
2 votes
1 answer
139 views

I am doing an online course that states that the reason we use LSTMs and similar variations of vanilla RNNs is because of the vanishing/exploding gradients problems with vanilla RNNs. However, an ...
HelloWorld's user avatar
0 votes
1 answer
125 views

WordPiece is a subword segmentation algorithm in the field of natural language processing. Different from BPE, WordPiece will select a pair with the largest mutual information to merge each time, and ...
korangar leo's user avatar
1 vote
0 answers
75 views

I just started to get interested in natural language processing and I was trying to understand the skipgram model from word2vec. I was reading this interesting website. However, in the mentioned ...
edamondo's user avatar
  • 111
3 votes
3 answers
622 views

I'm doing some analysis over natural language data, which basically entails: Computing some feature over all samples. Evaluating if this feature statistically significantly discriminates between ...
Andre Ye's user avatar
0 votes
0 answers
74 views

There are papers on semantic analysis using metadata such as "Sentiment Classification on Steam Reviews" (https://cs229.stanford.edu/proj2017/final-reports/5244171.pdf) and "Detecting ...
soravoid's user avatar
0 votes
1 answer
1k views

Llama2 is pretrained with 2 trillion of tokens: $2\times10^9$, and its batch size is of $4\times 10^6$. We can calculate the number of steps (times we upgrade the parameters) per epoch as follows: $$\...
Noether's user avatar
2 votes
0 answers
80 views

Suppose I would like to do extractive question answering on scientific literature. I'm interested in using BERT which was pretrained on Wiki and Bookcorpus. I see two routes here: 1. Fine-tune BERT on ...
Jose Garcia's user avatar
0 votes
1 answer
81 views

In an extreme (and probably impossible) example, could you not end up with all the power for the prediction being contained in the weights to the right of the embeddings layer?...and thus the matrix ...
osckt's user avatar
  • 31
1 vote
1 answer
183 views

I am new to NLP and I'm not fully grasping how word2vec works. I understand that it aims to predict a word given its context or a context given a word but I'm not sure how the initial vector values ...
osckt's user avatar
  • 31
0 votes
1 answer
688 views

I am new to the field of NLP and would appreciate any guidance please. I am trying to understand how word embeddings can be used in clustering and topic modelling. If I create word embeddings for ...
osckt's user avatar
  • 31
1 vote
1 answer
302 views

How does training of word embeddings lead to the clustering of similar words in the embedding space? What causes that effect?
Glue's user avatar
  • 515
1 vote
0 answers
277 views

I am trying to estimate the costs required for hosting a fine tuned large language model for real time inference. There will be 100s of users querying the endpoint concurrently for multiple use cases ...
user3711946's user avatar
1 vote
0 answers
63 views

Homophones Indian Surnames List English last names Can machine learning, Natural Language Processing (NLP), Artificial intelligence assist in classifying , interpreting and specifying the differences ...
Prashant Akerkar's user avatar
0 votes
1 answer
75 views

I had an idea of building a model using machine learning or deep learning in order to perform morphological tagging/labeling on untagged/unlabeled data. I have a lot of tagged/labeled data (about 30,...
Dolev Mitz's user avatar
1 vote
2 answers
263 views

In transformer models, positional embeddings are commonly used to encode the positional information of words in a sequence. While sinusoidal positional embeddings are often employed, I'm curious about ...
Glue's user avatar
  • 515
1 vote
1 answer
133 views

I have a case where I want to feed a network with polylines of data. The problem is that the input can be any number of polylines and the polylines can consist of any number of points. If we instead ...
JakobVinkas's user avatar
3 votes
0 answers
533 views

I want to re-calculate the last column of Table 3 of Attention is All You Need, i.e. number of params in the models. But numbers from my calculation do not match. Model Params from Table 3 ($\times 10^...
Judd's user avatar
  • 31
1 vote
0 answers
222 views

Problem Setting/Context: I have feedback(each feedback has multiple sentences) associated with different products(you can safely assume that a feedback talks about one single product), I need to ...
user avatar
1 vote
0 answers
41 views

I'm fine-tuning a large language model to predict binary sentiment, where a false negative is far more costly for my use case than a false positive. I've used weighted cross-entropy to account for ...
multiheadedattention's user avatar
1 vote
2 answers
540 views

Both word2vec and transformer model compute a SOFTMAX function over the words/tokens on the output side. For word2vec models, negative sampling is used for computational reasons: Is negative sampling ...
CyberPlayerOne's user avatar
2 votes
0 answers
123 views

https://arxiv.org/abs/2304.01933 shows that the best performing adapter-based parameter-efficient fine-tuning depends on the language model being fine-tuned: E.g., LORA is the best adapter for LlaMa-...
Franck Dernoncourt's user avatar
1 vote
1 answer
57 views

I'm making experiments to evaluate language models to brazilian portuguese datasets. So, i've made so each dataset is divided in 10 parts, I want to use cross-validation to determine the model's ...
Arthur Franco's user avatar
0 votes
1 answer
289 views

I'm having trouble understanding why I get radically different results if I try to find the parameter of a Zipf distribution when I use the methods proposed by Clauset et al. (2009) as opposed to ...
MarcoLin8's user avatar

1
2 3 4 5
23