At its core, decision-making is about answering questions. Should we launch this product? Should I order take-out for dinner? Should I go on this date? These are conditional questions that reflect some number of possible futures. The decision occurs when you select one of the futures and take the action that it prescribes.
From a mechanical point of view, decision-making is straight-forward. Each time you come to a branch in the road, you select one of the options and proceed onwards. If you aren’t concerned with the ultimate destination, it’s a simple process. …
One of the core tasks in information retrieval is searching. Anyone who deals with large amounts of text data (and that’s almost all of us) knows how difficult this seemingly simple task can be. If your search term is too broad, you may find yourself sifting through an impossible quantity of documents. And if your search term is too narrow, you could be missing out on relevant results. So how do we decide which documents are the most relevant to our search?
Search relevance is a difficult problem — and modern search engines employ highly sophisticated (and proprietary) algorithms to deal with the issue. We won’t delve into those algorithms, but let’s look at some simple strategies that you might employ in your own information retrieval applications. …
Writing is humanity’s superpower — when done well, it informs, provokes, and entertains. Perhaps that is why blogging is so popular among programmers. We’re a naturally curious community and sharing knowledge is an integral part of our ethos.
For the reader, the benefits of good writing are obvious. When an author takes the time to prepare a high-quality article, knowledge flows seamlessly from one mind to another. I would argue though that the benefits may be even greater for the writer. Writing a good technical article requires deep research, careful thought, and a significant amount of experimentation. …
Natural language processing (NLP) is a complex and evolving field. Part computer science, part linguistics, part statistics — it can be a challenge deciding where to begin. Books and online courses are a great place to start, and project-based learning is always a good idea, but at some point it becomes necessary to dig deeper, and that means looking at the academic literature.
Reading academic literature is an art unto itself, and just because a paper is popular doesn’t mean it’s the right place for a beginner. However, there is something to be said for papers that have withstood both the test of time and been widely accepted by experts. …
One of the great things about using Python for natural language processing (NLP) is the large ecosystem of tools and libraries. From tokenization, to machine learning, to data visualization — Python has something for every NLP task in your workflow. Of course, choosing the *right* tool isn’t always so easy. Every NLP library provides slightly different functionality and has slightly different implementation. The key to finding the right tool is having an awareness about what is out there, and experimenting with each of them such that you know each tool’s strengths and weaknesses. To that end, provided below is a list of the major NLP tools in use today. …
The Stanford NLP Group has long been an active player in natural language processing, particularly through their well-known CoreNLP Java toolkit. Until recently though, Stanford NLP has been a less well-known player in the Python community, which is a shame since many NLP practitioners work primarily in Python. But there’s good news! Stanford NLP’s Stanza Python library is coming into its own with the recent release of version 1.1.1!
The new Stanza version supports 66 different human languages (which is a big step forward, since NLP has long been very English-centric) and can carry out core NLP tasks like lemmatization and named entity recognition. …
In 2016, Cal Newport introduced a new term into the business lexicon: deep work. It’s an idea that has since taken hold of disaffected knowledge workers everywhere, due in no small part to the promise that they could finally start doing what they were hired to do — create value. More importantly, intertwined with this promise is something more nebulous — something fragile and fleeting. Dare we call it self-actualization? Anyone who has worked in a modern office knows the creeping sense that what you’re doing doesn’t really matter. It’s that unspoken but ever-present worry that your life is little more than a series of TPS reports. …
Much has been made of the question about what it is to be an informed citizen. We’re instructed to “read widely,” “engage in debate”, “seek out new viewpoints,” etc. The message is clear: the more information you consume, the better informed you will be. On its face, this is reasonable and well-intentioned advice. The problem is that it’s also completely wrong. We’ve become so enamored with the availability of information that we’re forgetting to first judge the quality of our information. For years I practiced this kind of information consumption — particularly with the daily news — only to become burned out by an overload of unfiltered, inaccurate, biased, and ultimately low-quality information. …
When Charlie bit his brother’s finger, little did he know that he was unleashing a virus that would burrow into the consciousness of nearly a billion YouTube viewers. Charlie created a meme without even knowing what a meme was. Or perhaps his brother, the victim of said bite, deserves credit—after all, it was his half-laughing, half-crying narration that made the phrase “Charlie bit me” famous. Or did the meme create itself—seizing an opportunity to launch upon an unsuspecting world?
The meme, not at all mindful who Charlie was or caring much for his habit of biting fingers, spotted an opportunity. And like any self-respecting virus, it set out to replicate itself far and wide, riding on the backs of two playful brothers and humankind’s susceptibility to awwww-inducing moments. …
One of the best things about programming is that there are many ways to solve the same problem. Of course, this is also one of the most difficult things about programming. For new programmers, the seemingly endless array of design patterns, best practices, techniques, principles, and all other manner of prescriptive dogma are intimidating at best and deeply demoralizing at worst. But there is good news hiding among the panoply of architectural choices: As different as they may seem, all such patterns eventually lead back to the same foundational principles. …
About