Hacker News Clone

Ask HN: Which recent research paper blew your mind?

by froster on 7/24/2023, 12:34:25 PM with 171 comments

by edent on 7/24/2023, 2:31:11 PM
"Overview of SHARD: A System for Highly Available Replicated Data" it's the first paper to introduce the concept of database sharding. It was published in 1988 by the Computer Corporation of America.
It is referenced hundreds of times in many classic papers.
But, here's the thing. It doesn't exist.
Everyone cites Sarin, DeWitt & Rosenb[e|u]rg's paper but none have ever seen it. I've emailed dozens of academics, libraries, and archives - none of them have a copy.
So it blows my mind that something so influential is, effectively, a myth.
by w-m on 7/24/2023, 2:52:28 PM
Integral Neural Networks (CVPR 2023 Award Candidate), a nifty way of building resizable networks.
My understanding of this work: A forward pass for a (fully-connected) layer of a neural network is just a dot product of the layer input with the layer weights, followed by some activation function. Both the input and the weights are vectors of the same, fixed size.
Let's imagine that the discrete values that form these vectors happen to be samples of two different continuous univariate functions. Then we can view the dot product as an approximation to the value of integrating the multiplication of the two continuous functions.
Now instead of storing the weights of our network, we store some values from which we can reconstruct a continuous function, and then sample it where we want (in this case some trainable interpolation nodes, which are convoluted with a cubic kernel). This gives us the option to sample different-sized networks, but they are all performing (an approximation to) the same operation. After training with samples at different resolutions, you can freely pick your network size at inference time.
You can also take pretrained networks, reorder the weights to make the functions as smooth as possible, and then compress the network, by downsampling. In their experiments, the networks lose much less accuracy when being downsampled, compared to common pruning approaches.
Paper: https://openaccess.thecvf.com/content/CVPR2023/papers/Solods...
Code: https://github.com/TheStageAI/TorchIntegral
by the_snooze on 7/24/2023, 1:34:25 PM
"Blue Is the New Black (Market): Privacy Leaks and Re-Victimization from Police-Auctioned Cellphones"
https://krebsonsecurity.com/2023/05/re-victimization-from-po...
Researchers bought up a bunch of seized phones from police auction sites and found about 25% of them were trivially unlockable and still held sensitive data about suspects and victims.
by snarfed on 7/24/2023, 2:07:48 PM
Not recent but legendary: "Latency Lags Bandwidth" David Patterson http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115...
Recent:
"How to Hack the Simulation?" Roman Yampolskiy https://www.researchgate.net/publication/364811408_How_to_Ha...
"On the Computational Practicality of Private Information Retrieval" Radu Sion, Bogdan Carbunar https://zxr.io/research/sion2007pir.pdf
(via "Explained from scratch: private information retrieval using homomorphic encryption," https://blintzbase.com/posts/pir-and-fhe-from-scratch/ )
by Daviey on 7/25/2023, 6:51:16 AM
I suppose recent is subjective, but this one totally blew my mind. It's a 260 year old academic publication, which is older than the USA:
```
  Mr. Bayes and Mr. Price. “An Essay towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, F. R. S. Communicated by Mr. Price, in a Letter to John Canton, A. M. F. R. S.” In: Philosophical Transactions of the Royal Society of London 53.0 (1763), pp. 370–418. DOI: 10.1098/rstl.1763.0053. URL: https://doi.org/10.1098/rstl.1763.0053
```
Not only is this publication amazing because it is the source of conditional probability which is the basis for Bayes's Theorem, but a dated original of the document is available. The thing I find really beautiful is that it was submitted 2 years after the authors death, by his friend who found the material when going through his things. We will never know how much was changed prior to submission, but I find it a really beautiful tribute to his friend and the co-author makes it clear that the attribution should go to his friend, and that letter still exists today (and it is a lovely read). I can't think of anything similar which has been maintained 260 years later, and still a somewhat useful academic publication. I felt privileged to cite this reference in my MSc thesis.
by philipkglass on 7/24/2023, 4:45:24 PM
"Liquid solution centrifugation for safe, scalable, and efficient isotope separation"
https://www.science.org/doi/10.1126/sciadv.adg8993
The authors show that a biological type laboratory ultracentrifuge can efficiently function as a near-universal isotope separator. Any element that can be dissolved as a salt in water -- the entire periodic table, excepting the noble gases -- can be enriched according to its relative mass. This can reduce the cost of refining certain isotopes like calcium-48 by orders of magnitude compared to the previous best techniques.
Left unsaid, but implied by its universality: the new technique is also a new approach to producing enriched fissile materials for nuclear reactors and weapons. It requires less chemical engineering sophistication than current processes which require production and handling of gaseous uranium hexafluoride.
by philipkglass on 7/24/2023, 4:34:27 PM
"Co-cultivation enhanced microbial protein production based on autotrophic nitrogen-fixing hydrogen-oxidizing bacteria"
https://www.sciencedirect.com/science/article/abs/pii/S13858...
Certain bacteria can directly assimilate a mixture of hydrogen, carbon dioxide, and nitrogen to produce protein. You could consider it an alternative to bacterial nitrogen fixation in root nodules with much higher productivity. Or you could consider it an alternative to the Haber-Bosch process with much milder reaction conditions -- ambient temperature and pressure. It's a way to turn intermittent electricity into protein with simple, robust equipment. I wouldn't be surprised if this or a related development ultimately supplants much of the current demand for synthetic nitrogen fertilizers.
by d-- on 7/24/2023, 3:36:10 PM
"Enso: A Streaming Interface for NIC-Application Communication" https://www.microsoft.com/en-us/research/uploads/prod/2023/0...
We've been using the same API to communicate with our NICs since 1994. That API severely limits network throughput and latency. By simply changing the API (no new NIC) you can get 6x higher throughput in some apps and 43% lower latency.
Code runs on FPGA NIC only for now: https://github.com/crossroadsfpga/enso
Won USENIX OSDI best paper award and best artifact award.
by GregarianChild on 7/24/2023, 2:03:51 PM
Someone managed to GPU-accelerate program synthesis, a form of symbolic ML. First time for ML that is not deep learning:
https://dl.acm.org/doi/10.1145/3591274
Deep learning took off precisely when the ImageNet paper dropped around 2010. Before nobody believed that backprop can be GPU-accelerated.
by Uptrenda on 7/24/2023, 6:01:45 PM
I find the most interesting papers I read all come from the same place: The National Library of Medicine https://www.nlm.nih.gov/
Here's some recent papers I liked:
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8413749/ -- Lithium is used as a mood stabilizer for bipolar and in other disorders. The form of Lithium used in psychiatry is Lithium Carbonate. But other forms also exist. As a supplement: there is Lithium Orotate which some people use to help them sleep, deal with stress, and so on. This paper puts forwards the idea that Lithium Orotate is preferable to Lithium Carbonate due to lower quantities being needed for the same therapeutic results. Resulting in less side-effects.
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525098/ -- In bipolar disorder its known that there are abnormalities in the presence of brain derived neurotrophic growth factor (BDNF.) What's interesting about this is treatments for bipolar help to increase BDNF which may be something of interest to those who are into nootropics.
The papers on this site are honestly some of the best written, in-depth, and accessible works I've come across anywhere. There's enough information here to live a better life if you're willing to sift through papers. No joke.
by dargscisyhp on 7/24/2023, 2:43:11 PM
I thought the AlphaZero paper was pretty cool: https://arxiv.org/abs/1712.01815
Not only did we get a whole new type of Chess engine, it was also interesting to see how the engine thought of different openings at various stages in its training. For instance, the Caro-Kann, which is my weapon of choice, was favored quite heavily by it for several hours and then seemingly rejected (perhaps it even refuted it?!) near the end.
by javajosh on 7/24/2023, 5:14:29 PM
I think you meant "recently published paper" but others are bringing up old stuff. I've always enjoyed Einstein's (translated) papers. They are both precise and readable, and his reputation for genius is well deserved!
This one is interesting, "On the Influence of Gravitation on the Propagation of Light" you can read here: https://einsteinpapers.press.princeton.edu/vol3-trans/393. This was his initial stab at GR. It was a simple approach, later abandoned, that considered light to slow down in a gravity well rather than remaining constant and specifying that mass warps spacetime. I suppose I like the idea that he pursued an idea that didn't work, and was forced to do something far more complex. It's kinda relatable.
by ranprieur on 7/24/2023, 4:50:59 PM
Placebo Effect Grows in U.S., Thwarting Development of Painkillers
https://www.scientificamerican.com/article/placebo-effect-gr...
The most interesting thing is that "placebo responses are rising only in the United States."
by di4na on 7/24/2023, 4:18:06 PM
All the stuff coming out of the Koka and Effekt development.
In particular last week i read their FBiP2 paper https://www.microsoft.com/en-us/research/uploads/prod/2023/0...
by intended on 7/24/2023, 2:59:51 PM
Content Moderation / Trust and Safety person
Open AI’s como paper, A Holistic Approach to Undesired Content Detection in the Real World.
https://arxiv.org/pdf/2208.03274.pdf
Lots of interesting facts are strewn around the paper.
——-
The first paper that squarely talked about the language resource gap in CS/ML. Before this came out, it was hard to explain just how stark the gap between English and other languages was.
Lost in Translation: Large Language Models in Non-English Content Analysis
https://cdt.org/insights/lost-in-translation-large-language-...
——
This paper gets in for the title:
“I run the world’s largest historical outreach project and it’s on a cesspool of a website.” Moderating a public scholarship site on Reddit: A case study of r/AskHistorians
https://drum.lib.umd.edu/bitstream/handle/1903/25576/CSCW_Pa...
——
This was the first paper I ended up saving on online misinformation. The early attempts to find solutions.
The Spreading of Misinformation online, https://www.pnas.org/doi/10.1073/pnas.1517441113
What I liked here was the illustration of how messages cascade differently based on the networks the message is traveling through.
by the-mitr on 7/24/2023, 2:31:03 PM
More is Different by P. W. Anderson (1972)'arguing that “at each level of complexity entirely new properties appear” — that is, although, for example, chemistry is subject to the laws of physics, we cannot infer the field of chemistry from our knowledge of physics.' The paper https://cse-robotics.engr.tamu.edu/dshell/cs689/papers/ander...
Also its impact https://www.nature.com/articles/s42254-022-00483-x
by bitxbitxbitcoin on 7/24/2023, 1:43:14 PM
“A classification of endangered high-THC cannabis (Cannabis sativa subsp. indica) domesticates and their wild relatives”
By McPartland and Small.
Moving on from cannabis sativa indica and cannabis sativa sativa to cannabis sativa indica Himalayansis and cannabis sativa indica asperrima depending on distribution from the original location of the extinct ancient cannabis wildtype.
Following this new classification, I believe there’s a third undocumented variety in North East Asia.
If anyone else has noticed the samesameification of cannabis strains and is wondering what the path forward is, this may be illuminating.
https://phytokeys.pensoft.net/article/46700/
by Silamoth on 7/24/2023, 3:02:11 PM
I recently read "Enabling tabular deep learning when d ≫ n with an auxiliary knowledge graph" (https://arxiv.org/pdf/2306.04766.pdf) for one of my graduate classes. Essentially, when there are significantly more data points than features (n >> d), machine learning usually works fine (assuming data quality, an underlying relationship, etc.). But, for sparse datasets where there are fewer data points than features (d >> n), most machine learning methods fail. There's just not enough data to learn all the relationships. This paper builds a knowledge graph based on relationships and other pre-existing knowledge of data features to improve model performance in this case. It's really interesting - I hadn't realized there were ways to get better performance in this case.
by maurits on 7/24/2023, 1:31:39 PM
LoRA: Low-Rank Adaptation of Large Language Models [1]
[1]: https://arxiv.org/abs/2106.09685
by jeffbee on 7/24/2023, 1:46:13 PM
https://cseweb.ucsd.edu/~tullsen/halfandhalf.pdf
Half&Half: Demystifying Intel’s Directional Branch Predictors for Fast, Secure Partitioned Execution
by lwansbrough on 7/25/2023, 8:39:54 AM
Detailed Rigid Body Simulation with Extended Position Based Dynamics
https://matthias-research.github.io/pages/publications/PBDBo...
A new way of doing real time physics that dramatically outperforms state of the art by simply introducing a new algorithm. No crazy AI or incremental improvements to existing approaches.
by carapace on 7/24/2023, 3:15:13 PM
"Cyclic Combinational Circuits", by Marc D. Riedel
> we present theoretical justification for the claim that the optimal form of some [combinational] circuits requires cyclic topologies. We exhibit families of cyclic circuits that are optimal in the number of gates, and we prove lower bounds on the size of equivalent acyclic circuits.
http://www.mriedel.ece.umn.edu/wiki/images/7/7a/Riedel_Cycli...
by danesparza on 7/24/2023, 2:58:28 PM
Not all of these are research papers. But all are fairly recent.
Gene linked to long COVID found in analysis of thousands of patients https://www.nature.com/articles/d41586-023-02269-2
Surfactants safely take down mosquitoes without using insecticides https://newatlas.com/science/surfactants-safely-take-down-mo...
This is what our Milky Way galaxy looks like when viewed with neutrinos https://arstechnica.com/science/2023/06/ghost-particles-have...
by dooraven on 7/24/2023, 1:38:38 PM
Can Language Models Teach Weaker Agents? Teacher Explanations Improve Students via Theory of Mind
https://arxiv.org/abs/2306.09299
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
https://huggingface.co/papers/2307.10373
Need to see code for second one.
by pyuser583 on 7/25/2023, 5:38:56 PM
“On the frequency and severity of interstate war” by Aaron Clauset.
https://arxiv.org/pdf/1901.05086.pdf
Analyzes a claim made in the 1950s by a prominent statistician: the frequency of interstate wars follows a simple Poisson arrival process and their severity follows a simple power-law distribution.
He was right, but it’s not clear why.
The paper is interesting because it shows how a new bit of knowledge creates a large number of known unknowns from previously unknown unknowns.
It also shows statistical magic was very much possible prior to computers.
by manvel_hn on 7/24/2023, 1:45:44 PM
Toolformer: Language Models Can Teach Themselves to Use Tools https://arxiv.org/abs/2302.04761
Older one, but still very nice work.
by gabitoju on 7/24/2023, 6:36:28 PM
"C-Store: A Column-oriented DBMS": https://web.stanford.edu/class/cs345d-01/rl/cstore.pdf
By among others, the great Mike Stonebraker.
by Jalad on 7/24/2023, 3:31:18 PM
"Bounding data races in space and time" was an interesting one I saw recently! It's discussing the memory models of programming languages, and how they can fail pretty horribly when data races occur, and then talks about ways to avoid those. OCaml's multicore support is based on this work, meaning it's memory safety guarantees in when data races occur are pretty interesting
https://kcsrk.info/papers/pldi18-memory.pdf
https://youtube.com/watch?v=eXXzUzt_nAY
by dan-g on 7/24/2023, 2:05:41 PM
Generative Agents: Interactive Simulacra of Human Behavior[1]. Make sure to check out the recorded demo!
[1] https://arxiv.org/abs/2304.03442
by msravi on 7/24/2023, 3:58:43 PM
Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm [0]
Also known as the Viterbi algorithm. Every digital communication device in existence today most likely has an implementation of it.
Later proved optimal by Forney [1]
0. https://www.essrl.wustl.edu/~jao/itrg/viterbi.pdf
1. https://www2.isye.gatech.edu/~yxie77/ece587/viterbi_algorith...
by masfuerte on 7/24/2023, 3:41:15 PM
Grid-free Monte Carlo for PDEs with spatially varying coefficients. https://cs.dartmouth.edu/wjarosz/publications/sawhneyseyb22g...
by cratermoon on 7/24/2023, 5:15:38 PM
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? <https://dl.acm.org/doi/10.1145/3442188.3445922>
by rurban on 7/25/2023, 4:50:50 AM
https://arxiv.org/pdf/1703.04234.pdf
On why Oppenheimer didn't get the Nobel for his black hole paper decades before some else got it for a rediscovery.
by freedude on 7/24/2023, 6:00:38 PM
An analysis of studies pertaining to masks in Morbidity and Mortality Weekly Report: Characteristics and quality of all studies from 1978 to 2023
"0/77 were randomized studies."
https://www.medrxiv.org/content/10.1101/2023.07.07.23292338v...
Here is a pdf.
https://www.medrxiv.org/content/10.1101/2023.07.07.23292338v...
by tracker1 on 7/25/2023, 2:43:57 PM
TBH, most of the research papers I've read the past few years have been around diet/nutrition. I can't think of anything in terms of recently blew my mind, some of it is definitely interresting, but most of it is low-quality noise.
Probably the biggest thing that blows my mind is the suppression of the Minnesota Coronary Study in the early 1960's. Literally half a century of dis/misinformation from the govt, pharma and medical industries that was disproven long ago. Nothing higher quality or more definitive since.
Basically, the whole, limit cholesterol and saturated fat intake in favor of more grains and seed oils is based on a theory that was long disproven. And, it's still pushed to this day. Why, there's big money/business in pharma and agriculture (corn, soy, wheat).
by jrmiii on 7/24/2023, 1:45:51 PM
I was impressed by the Voyager paper on a GPT-powered Minecraft bot.
https://voyager.minedojo.org/
by 0asa on 7/25/2023, 2:56:31 PM
I use Raycast Pro, which includes Raycast AI. It's awesome: it offers a clever integration of LLMs into the operating system.
by aaron695 on 7/24/2023, 3:09:39 PM
RNA demethylation increases the yield and biomass of rice and potato plants in field trials (2021)
https://www.nature.com/articles/s41587-021-00982-9