Yesterday I gave a talk on using Elastic Search for .NET Developers here in Ottawa. The slides used were from mostly from my presentation at DevTeach Montreal last year.
You can find the slides here on slideshare.
Transcript:
1. STORE 2 MILLION OF AUDIT LOGS A DAY INTO ELASTICSEARCH Taswar Bhatti (Microsoft MVP) GEMALTO @taswarbhatti http://taswar.zeytinsoft.co m taswar@gmail.com
2. WHO AM I? – 4 years Microsoft MVP – 17 years in software industry – Currently working as System Architect in Enterprise Security Space (Gemalto) – You may not have heard of Gemalto but 1/3 of the world population uses Gemalto they just dont know it – Gemalto has stacks build in many environnent .NET, Java, Node, Lua, Python, mobile (Android, IOS), ebanking etc
3. AGENDA – Problem we had and wanted to solve with Elastic Stack – Intro to Elastic Stack (Ecosystem) – Logstash – Kibana – Beats – Elastic Search flows designs that we have considered – Future plans of using Elastic Search
4. QUESTION & POLL – How many of you are using Elastic or some other logging solution? – How do you normally log? Where do you log? – Do you log in Relational Database?
5. HOW DO YOU TROUBLESHOOT OR FIND YOUR BUGS – Typically in a distributed environment one has to go through the logs to find out where the issue is – Could be multiple systems that you have to go through which machine/server generated the log or monitoring multiple logs – Even monitor firewall logs to find traffic routing through which data center – Chuck Norris never troubleshoot; the trouble kills themselves when they see him coming
6. Image
7. OUR PROBLEM – We had distributed systems (microservices) that would generate many different types of logs, in different data centers – We also had authentication audit logs that had to be secure and stored for 1 year – We generate around 2 millions records of audit logs a day, 4TB with replications – We need to generate reports out of our data for customers – We were still using Monolith Solution in some core parts of the application – Growing pains of a successful application – We want to use a centralized scalable logging system for all our
8. FINDING BUGS THROUGH LOGS
9. A LITTLE HISTORY OF ELASTICSEARCH – Shay Banon created Compass in 2004 – Released Elastic Search 1.0 in 2010 – ElasticSearch the company was formed in 2012 – Shay wife is still waiting for her recipe app
10. Image
11. ELASTIC STACK
12. ELASTICSEARCH – Written in Java backed by Lucene – Schema free, REST & JSON based document store – Search Engine – Distributed, Horizontally Scalable – No database storage, storage is Lucene – Apache 2.0 License
13. COMPANIES USING ELASTIC STACK
14. ELASTICSEARCH INDICES – Elastic organizes document in indices – Lucene writes and maintains the index files – ElasticSearch writes and maintains metadata on top of Lucene – Example: field mappings, index settings and other cluster metadata
15. DATABASE VS ELASTIC
16. ELASTIC CONCEPTS – Cluster : A cluster is a collection of one or more nodes (servers) – Node : A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities – Index : An index is a collection of documents that have somewhat similar characteristics. (e.g Product, Customer, etc) – Type : Within an index, you can define one or more types. A type is a logical category/partition of your index. – Document : A document is a basic unit of information that can be indexed – Shard/Replica: Index divided into multiple pieces called shards, replicas are copy of your shards
17. ELASTIC NODES – Master Node : which controls the cluster – Data Node : Data nodes hold data and perform data related operations such as CRUD, search, and aggregations. – Ingest Node : Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing – Coordinating Node : only route requests, handle the search reduce phase, and distribute bulk indexing.
18. SAMPLE JSON DOCUMENT HTTP CALL JSON DOCUMENT
19. ELASTICSEARCH CLUSTER
20. TYPICAL CLUSTER SHARD & REPLICA
21. SHARD SEARCH AND INDEX
22. DEMO OF ELASTICSEARCH
23. LOGSTASH – Ruby application runs under JRuby on the JVM – Collects, parse, enrich data – Horizontally scalable – Apache 2.0 License – Large amount of public plugins written by Community https://github.com/logstash- plugins
24. TYPICAL USAGE OF LOGSTASH
25. Image
26. LOGSTASH INPUT
27. LOGSTASH FILTER
28. LOGSTASH OUTPUT
29. DEMO LOGSTASH
30. BEATS
31. BEATS – Lightweight shippers written in Golang (Non JVM shops can use them) – They follow unix philosophy; do one specific thing, and do it well – Filebeat : Logfile (think of it tail –f on steroids) – Metricbeat : CPU, Memory (like top), redis, mongodb usage – Packetbeat : Wireshark uses libpcap, monitoring packet http etc – Winlogbeat : Windows event logs to elastic – Dockbeat : Monitoring docker – Large community lots of other beats offered as opensource
32. Image
33. FILEBEAT
34. X-PACK – Elastic commercial offering (This is one of the ways they make money) – X-Pack is an Elastic Stack extension that bundles – Security (https to elastic, password to access Kibana) – Alerting – Monitoring – Reporting – Graph capabilities – Machine Learning
35. Image
36. KIBANA – Visual Application for Elastic Search (JS, Angular, D3) – Powerful frontend for dashboard for visualizing index information from elastic search – Historical data to form charts, graphs etc – Realtime search for index information
37. Image
38. DEMO KIBANA
39. DESIGNS WE WENT THROUGH – We started with simple design to measure throughput – One instance of logstash and one instance of ElasticSearch with filebeat 9/22/2017 39
40. DOTNET CORE APP – We used a dotnetcore application to generate logs – Serilog to generate into json format and stored on file – Filebeat was installed on the linux machine to ship the logs to logstash
41. PERFORMANCE ELASTIC – 250 logs item per second for 30 minutes
42. OVERVIEW
43. LOGSTASH
44. ELASTIC SEARCH RUN TWO – 1000 logs per second, run for 30 minutes
45. PERFORMANCE
46. OTHER DESIGNS
47. WHAT WE ARE GOING WITH FOR NOW, UNTIL…..
48. CONSIDERATIONS OF DATA – Index by day make sense in some cases – In other you may want to index by size rather (Black Friday more traffic than other days) when Shards are not balance ElasticSearch doesn’t like that – Don’t index everything, if you are not going to search on specific fields mark them as text
49. FUTURE CONSIDERATIONS – Investigate into Elastic Search Machine learning – ElasticSearch with Kafka for cross data center replication
50. THANK YOU & OPEN TO QUESTIONS – Questions??? – Contact: Taswar@gmail.com – Blog: http://Taswar.zeytinsoft.com – Twitter: @taswarbhatti – LinkedIn (find me and add me)