Sometimes I have find things that I think are quite interesting
                                        and try to understand them.
                                        
                                        Inspired by mitxela.com
I spent this past summer working with Apple's GPU Post-Silicon Power & Performance team, and I've been surprised by just how much you can do post-silicon to benefit power and performance charactersitics of some SoC.
I mainly worked on using post-silicon characteristics in clever ways to identify exact architectural and software bottlenecks in a GPU. At first this seems near impossible because using voltage and current measurements naïvely the best you can do is identify lowest voltages for given frequencies. However, I learnt over the summer that you can play clever tricks with manipulation of clocks, frequencies and GPU test code to isolate and identify bottlenecks - none of which I would have been able to say was possible post-silicon before this summer.
Also, very pleasantly surprised by Apple - it is a very very good place to work, and the people are wonderful.
Saturday, 16th August 2025
/behind-big-nda 
                I took a class with Dr Brian Towles about parallel architectures (consistency, gpus, accelerating ML etc), and for the final project ended up trying to deal with the NP-hard problem of SAT solving.
SAT solving is essentially figuring out if a given boolean formula is satisfiable (ie can we find some combination of assignments such as A=1, B=0, C=1, etc such that the formula is true). It is hard because there are exponentially many combinations to check. It is also very useful from mathematics to circuit testing to the design of new drugs.
This problem is theoretically very parallelisable, however due to bad memory locality and control flow overhead associated with the problem, it is hard to speedup. We ended up creating a custom architecture that can be summarized as a work-stealing grid of nodes, and our simulator showed us that we have a speedup of around 100x over the MiniSAT solver (software state of the art). Obviously better evaluation and thorough testing is needed, but it seems promising.
Monday, 5th May 2025
/SatSwarm 
                This summer I was fortunate enough to start working with Prof Bhattacharjee at Yale designing BCI chips. Specifically, I was designing a simulation tool that could model arbitrary BCI algorithms at the hardware level.
Something that stood out to me about the SCALO chip design was the idea of design chips without an ISA.
Simply putting a generalised solution for a specific task onto an accelerator and then creating a network of accelerators that are then optimised by some graph scheme could lead to a new way of doing computation, further than just BCIs.
Some real value could be added in making algorithms lie closer to the hardware, and compute faster and more energy efficiently.Saturday, 21st September 2024
/closed-source-for-now-:( 
                How does one evaluate a new design choice for a hardware system? It's unreasonable to create rtl files and write testbenches on the scale of a fully functioning x86 processor. Too much work and computation for something that could be a bad idea.
I recently started using simulator models for a research project, specifically Gem5 (and simplescalar although I didn't like that as much). The ability to model entire systems and run programs like operating systems to evaluate them is essential for chip design - and it's surprisingly incomplete for something so important (at least in the public domain).
Also, now this makes me curious as to what simulators for very closed and secretive architectures (ie Apple's M series chips) are like.
Saturday, 21st September 2024
/CheckerChip 
                CUDA with C is very fast. Basically GPUs are very fast, but finding a way to get independent instructions that can be run in parallel is difficult and often not possible. If it was possible, no one (I think?) would use CPUs except in niche cases as GPUs are just inherintly more power efficient and faster if we can get them independent instructions.
I learnt how to write CUDA code for C, deal with memory allocation and transfer, how to write kernels. The speedup compared to a CPU implementation of a 3 body problem I wrote was ~70x, which was very impressive.
Initial motivation was to significantly speed up the MSFragger algorithm for proteomics.
Saturday, 25th May 2024
/Quaint-Collection-Of-Projects/CUDA_in_C 
                I have been quite interested in the idea of q learning for a while, so decided to beat DOOM with it.
The general idea was to create agents that play DOOM, and update strategy based on a reward using the PPO algorithm. It worked quite well for the first level (albeit the model only needs to move right, left or shoot using visual cues).
I'd quite like to work with the Neural MMO environment next, developed by Joeseph Suarez. It's a very RL friendly environment that simulates a very, very complex system.
Saturday, 25th May 2024
/Quaint-Collection-Of-Projects/Beat_DOOM 
                I thought I'd start with this website. I learnt a bit of HTML, CSS and other webdev stuff, and though I'd put it to use making somewhere I put things I've done.
I soon realised making custom CSS for everything is very much not worth it, so I made use of Tailwind CSS. The code for this website is on my github, and probably quite unreadable.
Saturday, 25th May 2024
/shaan106.github.io
                        [empty]
                        
                        maybe add a table of contents?
                        
soon
/Shaan106